Tolga dfavenfre

Tolga Şakar — AI & ML Engineer · NLP Researcher

💻 AI & ML Engineer building production-grade multimodal AI systems, autonomous agents, and NLP pipelines.
📖 Independent research on morphologically-aware neural tokenization, word representations, and Retrieval-Augmented Generation for low-resource / agglutinative languages — under lonewolf-rd.

📰 Publications

Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish — arXiv preprint, 2026 (sole author). A lossless, morphology-aware neural tokenizer and word embedder for Turkish. A differentiable Poisson–binomial soft segmentation produces exact, surface-preserving morpheme splits (decode(encode(w)) = w), while the same forward pass yields structured word embeddings. Achieves the lowest BPC among reversible tokenizers and roughly 2× the morphological alignment of BPE/WordPiece/Unigram, and leads BERTurk and BGE-M3 on lexical retrieval. Repo · Model · Demo · arXiv link coming soon

Maximizing RAG Efficiency: A Comparative Analysis of RAG Methods — Natural Language Processing, Cambridge University Press (SCI Q1), 2025. A grid-search study of 23,625 configurations across vector stores, embedding models, and LLMs on cross-domain data, quantifying the trade-offs between retrieval quality, similarity-based ranking, token usage, runtime, and hardware utilization. Shows that contextual compression filters substantially reduce token consumption and hardware load, at a similarity cost that is often acceptable depending on the RAG method and use case. Paper · PDF

🖥️ Open-Source Projects

Machine Learning / AI Agents

Title	Tech Stack
Multi-Modal RAG
RAG Optimization
TalkYou
LLMRoboFund
Electricity Price Forecasting
Olivetti Face Recognition (CNN)
Fashion MNIST
MobileNetV1 Julia Implementation
EfficientNetV2 Transfer Learning (CNN)
Food Vision (CNN)
Econ Dashboard
Bitcoin Price Forecasting
Bike Sharing Demand Prediction
Financial Sentiment Classifier
Bank Customer Deposit Prediction
Credit Score Prediction