Lokasi ngalangkungan proxy:   [ UP ]  
[Ngawartoskeun bug]   [Panyetelan cookie]                
Skip to content

LoneWolf RD

Independent research and development in natural language processing, with a focus on low-resource languages, retrieval-augmented systems, and morphology-aware modeling. Solo-author publications and open-source releases.


Active Projects

Repository Description Scope Status
TurkishMorpheus Morpheus — neural morpheme-aware tokenizer and word embedder for Turkish. Lossless, low-BPC, morphology-aware, embedding-producing — the only such tokenizer usable in Turkish LLMs. Research + Release v4 complete, Paper
CorpusCollector Reproducible Turkish corpus collection from Ekşisözlük + Dergipark + news sites. Used to build the Morpheus training corpus. Tooling Stable
CogOS Cognitive Operating System. Research Early

Hugging Face

  • 🤗 Model: Morpheus-TR-50K — Neural morpheme-aware tokenizer and word embedder for Turkish
  • 🚀 Live demo: morpheus-tr-demo — interactive segmentation + embedding explorer

Publications

2025

  • Ĺžakar, T., & Emekci, H. Maximizing RAG efficiency: A comparative analysis of RAG methods. Natural Language Processing, 31(1), Cambridge University Press. link

2026

  • Ĺžakar, T. Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish. arXiv preprint. Link

Research Areas

  • Low-resource and morphologically-rich language modeling (esp. Turkish and other agglutinative languages)
  • Tokenization design for downstream language model efficiency
  • Retrieval-augmented generation systems and their comparative efficiency
  • Open, reproducible NLP pipelines — full data collection → training → evaluation chains

Philosophy

Research as a solo practitioner: complete pipelines released as working code, not slideware. Every model release ships with the corpus collector, training scripts, and evaluation suite that produced it — so anyone can reproduce or adapt.


Connect

  • đź’Ľ LinkedIn
  • đź“§ Open an issue on any repository for technical discussion

Popular repositories Loading

  1. TurkishMorpheus TurkishMorpheus Public

    Morpheus is the only lossless, morphology-aware tokenizer for Turkish that is usable in a generative LLM — and among reversible tokenizers it achieves the lowest BPC, while uniquely producing struc…

    Python 2

  2. .github .github Public

  3. CorpusCollector CorpusCollector Public

    Multi-source Turkish corpus collection and cleaning pipeline — academic, news, and forum data, language-filtered and consolidated for training the Morpheus-TR morphology-aware tokenizer.

    Python

Repositories

Showing 3 of 3 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…