- 💻 AI & ML Engineer building production-grade multimodal AI systems, autonomous agents, and NLP pipelines.
- 📖 Independent research on morphologically-aware neural tokenization, word representations, and Retrieval-Augmented Generation for low-resource / agglutinative languages — under lonewolf-rd.
Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish — arXiv preprint, 2026 (sole author).
A lossless, morphology-aware neural tokenizer and word embedder for Turkish. A differentiable Poisson–binomial soft segmentation produces exact, surface-preserving morpheme splits (decode(encode(w)) = w), while the same forward pass yields structured word embeddings. Achieves the lowest BPC among reversible tokenizers and roughly 2× the morphological alignment of BPE/WordPiece/Unigram, and leads BERTurk and BGE-M3 on lexical retrieval.
Repo · Model · Demo · arXiv link coming soon
Maximizing RAG Efficiency: A Comparative Analysis of RAG Methods — Natural Language Processing, Cambridge University Press (SCI Q1), 2025. A grid-search study of 23,625 configurations across vector stores, embedding models, and LLMs on cross-domain data, quantifying the trade-offs between retrieval quality, similarity-based ranking, token usage, runtime, and hardware utilization. Shows that contextual compression filters substantially reduce token consumption and hardware load, at a similarity cost that is often acceptable depending on the RAG method and use case. Paper · PDF


