Lokasi ngalangkungan proxy:   [ UP ]  
[Ngawartoskeun bug]   [Panyetelan cookie]                
Skip to content
View dfavenfre's full-sized avatar
📈
Grindin'
📈
Grindin'

Block or report dfavenfre

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dfavenfre/README.md

Tolga Şakar — AI & ML Engineer · NLP Researcher


ORCID HuggingFace Kaggle

  • 💻 AI & ML Engineer building production-grade multimodal AI systems, autonomous agents, and NLP pipelines.
  • 📖 Independent research on morphologically-aware neural tokenization, word representations, and Retrieval-Augmented Generation for low-resource / agglutinative languages — under lonewolf-rd.

📰 Publications

Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for TurkisharXiv preprint, 2026 (sole author). A lossless, morphology-aware neural tokenizer and word embedder for Turkish. A differentiable Poisson–binomial soft segmentation produces exact, surface-preserving morpheme splits (decode(encode(w)) = w), while the same forward pass yields structured word embeddings. Achieves the lowest BPC among reversible tokenizers and roughly 2× the morphological alignment of BPE/WordPiece/Unigram, and leads BERTurk and BGE-M3 on lexical retrieval. Repo · Model · Demo · arXiv link coming soon

Maximizing RAG Efficiency: A Comparative Analysis of RAG MethodsNatural Language Processing, Cambridge University Press (SCI Q1), 2025. A grid-search study of 23,625 configurations across vector stores, embedding models, and LLMs on cross-domain data, quantifying the trade-offs between retrieval quality, similarity-based ranking, token usage, runtime, and hardware utilization. Shows that contextual compression filters substantially reduce token consumption and hardware load, at a similarity cost that is often acceptable depending on the RAG method and use case. Paper · PDF


🖥️ Open-Source Projects

Machine Learning / AI Agents
Title Tech Stack
Multi-Modal RAG LangChain ChromaDB
RAG Optimization LangChain LangSmith FAISS
TalkYou LangChain LangGraph Docker Streamlit FastAPI
LLMRoboFund LangChain SQL Streamlit ChromaDB
Electricity Price Forecasting TF XGBoost
Olivetti Face Recognition (CNN) PyTorch
Fashion MNIST W&B TF
MobileNetV1 Julia Implementation W&B Julia Flux
EfficientNetV2 Transfer Learning (CNN) TF W&B
Food Vision (CNN) TF W&B
Econ Dashboard TF Streamlit SQL
Bitcoin Price Forecasting PMDARIMA SCIPY
Bike Sharing Demand Prediction XGBoost LGBM OPTUNA SCIKITLEARN
Financial Sentiment Classifier TF
Bank Customer Deposit Prediction XGBoost Streamlit SCIKITLEARN
Credit Score Prediction SCIKITLEARN

📊 GitHub Stats

Pinned Loading

  1. MultiModal-RAG MultiModal-RAG Public

    Jupyter Notebook

  2. RAG-Optimization RAG-Optimization Public

    Jupyter Notebook

  3. TalkYou TalkYou Public

    TalkYou is an innovative open-source project designed to enable users to have a chat with any YouTube video. It brings you a customized chatbot experience, not only with the ability to chat but als…

    Python 1

  4. LLMRoboFund LLMRoboFund Public

    LLMRoboFund is a powerful chatbot empowered with Multi-document RAG. The chatbot is equipped with RetrievalQA and SQL Agents to ease the investment research

    Jupyter Notebook 7 1

  5. MobileNet-Julia MobileNet-Julia Public

    Julia

  6. Olivetti-Faces-PyTorch Olivetti-Faces-PyTorch Public

    Jupyter Notebook