semantic-caching

Here are 31 public repositories matching this topic...

zzbright1998 / SentenceKV

Official implementation of "SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching" (COLM 2025). A novel KV cache compression method that organizes cache at sentence level using semantic similarity.

natural-language-processing transformers memory-efficiency efficient-inference inference-optimization kv-cache llm semantic-caching colm2025

Updated Sep 29, 2025
Python

Hyperion-HQ / Hyperion

Star

Ultra-low-latency LLM gateway with microsecond caching, dynamic routing, budgets, analytics, and forecasting.

Updated Apr 2, 2026
Go

renswickd / semantic-prompt-cache

Star

This app leverages Semantic Caching to minimize inference latency and reduce API costs by reusing semantically similar prompt responses.

optimization ttl-cache rag mistral-api semantic-caching

Updated Jul 4, 2025
Python

AzureManagedRedis / semantic-caching-demo-and-calculator

Star

Semantic caching demo with real-time streaming and a cost & sizing calculator, powered by Azure Managed Redis and Azure OpenAI.

demo azure-managed-redis semantic-caching cost-modeling

Updated Nov 12, 2025
Python

redislabsdev / langcache-customer-data-eval

Star

Evaluate how a semantic cache performs on your dataset by computing key KPIs over a threshold sweep and producing plots/CSVs:

redis evaluation vector-database semantic-caching

Updated Mar 11, 2026
Python

AP3008 / Janus

Star

Rust Local Token Compression Proxy for coding agents, built solo for GenAI Genesis 2026. 🏆 1st Google Sustainability Hack

rust redis local proxy-server tui tokio deduplication ratatui axum-framework token-compression semantic-caching

Updated Mar 16, 2026
Rust

Chief-Strategist-J / llm-observability-platform

Star

High-performance LLM observability and evaluation platform with automated instrumentation, stateful chat orchestration, semantic vector memory caching, and scheduled Temporal workers for cost anomaly detection.

python go clickhouse semantic-search temporal rag opentelemetry vector-database llm-observability semantic-caching llmops-prompt-engineering

Updated Jun 12, 2026
Python

sensoris / semcache-python

Star

Python library for the Semcache API

python ai openai llm anthropic semantic-caching

Updated Jun 9, 2025
Python

Clement-Okolo / Semantic-Cache

Star

Semantic caching for LLM responses using Redis Vector DB, LangChain, and HuggingFace embeddings, parses PDFs, generates FAQs with Groq, and serves similarity-based answers without redundant LLM calls.

chunking long-term-memory vector-database semantic-caching llamacloud live-caching batch-caching

Updated Feb 28, 2026
Jupyter Notebook

sunilp303 / claude-cost-gateway

Star

An intelligent gateway for Claude APIs that dynamically routes requests to the most cost-efficient model, caches responses, and escalates based on confidence signals — reducing LLM spend without sacrificing quality.

api-proxy observability claude cost-optimization generative-ai llmops prompt-caching claude-api ai-gateway agentic-ai llm-gateway llm-routing inference-gateway semantic-caching llm-infrastructure

Updated May 6, 2026
Python

Pralishatripathy000 / Semantic-cache-llm

Star

Semantic caching system using Meta Llama 3.3, Groq, FAISS, and Sentence Transformers to reduce LLM latency and API costs through meaning-based query matching.

machine-learning transformers vector-search groq faiss-vector-database llama3 semantic-caching

Updated Jun 4, 2026
Python

leisurelyleon / ragline

Star

A retrieval-augmented generation pipeline in Python with a rigorous offline evaluation harness. Chunks and embeds documents, retrieves by vector similarity, and generates grounded answers — with pluggable LLM providers (including a deterministic local fake for tests) and metrics for retrieval quality and answer faithfulness. No API key required.

microsoft python google-ai fastapi token-cost rag-system semantic-caching zoo-design-studio golden-dataset

Updated Jun 1, 2026
Python

awesome-pro / smartmemo

Sponsor

Star

Semantic memory and caching for LLM agents with classifier-validated equivalence instead of naive cosine thresholds.

python machine-learning sqlite pytorch embeddings developer-tools ai-agents cost-optimization faiss vector-search sentence-transformers semantic-memory llm llmops semantic-cache semantic-caching

Updated Jun 10, 2026
Python

manishklach / semantic-kv-control-plane

Star

A systems research platform for semantic KV-cache orchestration, topology-aware memory placement, distributed prefix reuse, and rack-scale inference memory simulation.

Updated May 25, 2026
Python

sensoris / semcache-node

Star

Node SDK for the Semcache API

node js openai llm semantic-caching

Updated Jun 18, 2025
JavaScript

maichanks / llm-cost-optimizer

Star

LLM cost monitoring and optimization toolkit

redis monitoring budget cost-optimization llm openrouter prompt-compression semantic-caching token-tracking ai-cost openclaw api-cost-management

Updated Mar 16, 2026
JavaScript

developertogo / velo-sentinel

Star

Production-grade Java 25 Virtual Thread inference gateway bridging NVIDIA Triton → Dynamo with Earliest Deadline First (EDF) priority queuing, adaptive batching, and async shadow validation.

redis distributed-systems grpc priority-queues load-balancing model-serving triton-inference-server virtual-threads inference-gateway semantic-caching nvidia-dynamo disaggregated-serving

Updated May 9, 2026
Java

Nagpal45 / memoria

Star

Semantic LLM Gateway featuring intelligent prompt routing (basic MoE), L1/L2 semantic caching (Redis + pgvector), fault-tolerant model fallbacks, and real-time streaming telemetry. Built to reduce AI inference latency and optimize API compute costs.

redis devops typescript mongodb nextjs telemetry expressjs sse system-design pgvector ai-gateway llm-orchestration semantic-caching

Updated Mar 11, 2026
TypeScript

dchukkapalli-dev / semantic-caching-llm-companion

Star

Machine-readable companion to the IEEE OJ-CS survey 'Semantic Caching and Response Reuse for Large Language Model Services: A Survey' (Chukkapalli, Mishra, Naik, 2026): 21-work evidence matrix, systematic-search log, proposed benchmark trace schema, stdlib-only contract validator, and CPU pilot. Code MIT; data CC-BY-4.0.

benchmark survey prisma inference-serving llm semantic-caching response-reuse cache-correctness

Updated Jun 5, 2026
Python

nunoferna / aegis-llm

Star

LLMOps API Gateway in Go. Optimizes GenAI workloads with Qdrant semantic caching, Redis rate-limiting, and OpenTelemetry metrics.

docker kubernetes redis golang api-gateway proxy rate-limiting gemini openai cloud-native helm-chart opentelemetry qdrant llm anthropic semantic-caching

Updated Mar 15, 2026
Go

Improve this page

Add a description, image, and links to the semantic-caching topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the semantic-caching topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

semantic-caching

Here are 31 public repositories matching this topic...

zzbright1998 / SentenceKV

Hyperion-HQ / Hyperion

renswickd / semantic-prompt-cache

AzureManagedRedis / semantic-caching-demo-and-calculator

redislabsdev / langcache-customer-data-eval

AP3008 / Janus

Chief-Strategist-J / llm-observability-platform

sensoris / semcache-python

Clement-Okolo / Semantic-Cache

sunilp303 / claude-cost-gateway

Pralishatripathy000 / Semantic-cache-llm

leisurelyleon / ragline

awesome-pro / smartmemo

manishklach / semantic-kv-control-plane

sensoris / semcache-node

maichanks / llm-cost-optimizer

developertogo / velo-sentinel

Nagpal45 / memoria

dchukkapalli-dev / semantic-caching-llm-companion

nunoferna / aegis-llm

Improve this page

Add this topic to your repo