Independent research and development in natural language processing, with a focus on low-resource languages, retrieval-augmented systems, and morphology-aware modeling. Solo-author publications and open-source releases.
| Repository | Description | Scope | Status |
|---|---|---|---|
| TurkishMorpheus | Morpheus — neural morpheme-aware tokenizer and word embedder for Turkish. Lossless, low-BPC, morphology-aware, embedding-producing — the only such tokenizer usable in Turkish LLMs. | Research + Release | v4 complete, Paper |
| CorpusCollector | Reproducible Turkish corpus collection from Ekşisözlük + Dergipark + news sites. Used to build the Morpheus training corpus. | Tooling | Stable |
| CogOS | Cognitive Operating System. | Research | Early |
- 🤗 Model:
Morpheus-TR-50K— Neural morpheme-aware tokenizer and word embedder for Turkish - 🚀 Live demo:
morpheus-tr-demo— interactive segmentation + embedding explorer
2025
- Ĺžakar, T., & Emekci, H. Maximizing RAG efficiency: A comparative analysis of RAG methods. Natural Language Processing, 31(1), Cambridge University Press. link
2026
- Ĺžakar, T. Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish. arXiv preprint. Link
- Low-resource and morphologically-rich language modeling (esp. Turkish and other agglutinative languages)
- Tokenization design for downstream language model efficiency
- Retrieval-augmented generation systems and their comparative efficiency
- Open, reproducible NLP pipelines — full data collection → training → evaluation chains
Research as a solo practitioner: complete pipelines released as working code, not slideware. Every model release ships with the corpus collector, training scripts, and evaluation suite that produced it — so anyone can reproduce or adapt.
- đź’Ľ LinkedIn
- đź“§ Open an issue on any repository for technical discussion