Lokasi ngalangkungan proxy:   [ UP ]  
[Ngawartoskeun bug]   [Panyetelan cookie]                
Skip to content

nnfuzzy/her301

Repository files navigation

her301

An open-source voice AI companion. Real-time speech via Gemini Live, persistent memory via Zep, persona-as-data, and a tool registry that actually does things.

her301 — Your own Samantha. Real-time voice AI you can run.

The voice companion the movie Her promised — open source, self-hostable, and built around the four technologies that finally made it possible: Gemini Live for sub-second audio, Zep for memory across sessions, Genkit for tool calling, and a persona-as-data architecture that turns the same tool registry into completely different companions.

What it is: a voice-first AI companion you can run yourself, with a real persona, real memory, and tools that affect the real world. What it isn't: a voice agent framework (use Pipecat / Vapi / Retell / LiveKit), a closed companion product (Replika / Pi / ChatGPT Voice), or a black-box SDK wrapper that hides the protocol. The repo ships both Gemini Live paths — the production version on Google's @google/genai SDK with ephemeral auth tokens, and the original raw-WebSocket implementation kept in the tutorial as a byte-level teaching reference.


Try it without installing

A live Marcus persona is hosted at her301.targeting.ai.

Want to see her use her tools on something you sent? Send a short email or calendar invite to nnfuzzy.her@gmail.com — it's the test inbox the demo reads from. The next time someone runs the demo, Samantha may surface your message in the Gmail / calendar tools. Polite use, please — it's a real Gmail account and bounces aren't fun.


TL;DR — get it running

pnpm install
cp .env.example .env             # fill in keys (see "Required keys" below)
pnpm db:push                     # create Postgres tables
pnpm dev                         # http://localhost:3000

Minimum viable run (text mode only, default theodore persona, no tools that need OAuth):

GOOGLE_API_KEY=…             # Gemini key (also covers voice if you flip on)
HER301_DATABASE_URL=postgresql://…

Full voice mode additionally requires:

GOOGLE_API_KEY=…              # Gemini Live is Google-only
ZEP_API_KEY=…                 # optional but strongly recommended

Voice in/out via Gemini Live, text fallback via DeepSeek/OpenAI/Gemini (configurable). Voice mode is Gemini-only — there is no provider swap for the realtime audio path.


Detailed documentation

Architecture

┌────────────┐   tRPC      ┌──────────────────┐   Genkit flow   ┌──────────────┐
│  React UI  │───────────▶ │  Express server  │────────────────▶│  Companion   │
│  (Vite)    │             │   (no auth gate) │                 │  (text mode) │
└─────┬──────┘             └────────┬─────────┘                 └──────┬───────┘
      │                             │                                  │
      │  WebSocket via SDK          │  /api/live/token (rate-limited)  │
      ▼                             ▼                                  ▼
┌────────────────────┐      ┌────────────────┐               ┌──────────────┐
│  Gemini Live API   │      │  PostgreSQL    │               │  Tool        │
│  (audio in/out)    │      │  (Drizzle ORM) │◀──────────────│  registry    │
└────────────────────┘      └────────────────┘               └──────┬───────┘
                                                                    │
                                                            ┌───────▼──────┐
                                                            │  Zep Cloud   │
                                                            │  (memory)    │
                                                            └──────────────┘

Stack: React 19 + Vite (client) · Express + tRPC 11 (server) · Drizzle ORM + PostgreSQL · Genkit + multi-provider LLMs (text) · Gemini Live raw WebSocket (voice) · Zep Cloud (long-term memory) · Tailwind 4.

Key paths:

  • server/_core/index.ts — Express bootstrap + auth middleware + Live API endpoints
  • server/routers.ts — tRPC routes (chat, threads, sidebar, expenses)
  • server/genkit/{ai,companion}.ts — Genkit instance (multi-provider) + companion flow
  • server/live/token.ts — Gemini Live token + system-prompt injection + Zep context
  • server/tools/{config,registry}.ts — tool config and dual-resolution (Genkit + Live)
  • server/persona/{registry,schema,theodore,marcus}.ts — persona-as-data
  • server/memory/zep.ts — Zep Cloud wrapper (gracefully no-ops when ZEP_API_KEY absent)
  • server/auth.ts — Live API per-caller rate limiter (in-memory, IP-keyed)
  • client/src/hooks/useGeminiLive.ts — WebSocket client, AudioWorklet capture, PCM playback
  • client/src/components/VoiceOverlay.tsx — voice UI

Setup in detail

1. Prerequisites

  • Node 20+
  • pnpm 9+
  • PostgreSQL 14+
  • (no auth dependency) — her301 ships unauthenticated; bring your own provider for multi-tenant deployments

2. Environment

All env vars are read by server/_core/env.ts. Vite-prefixed vars (none required by default) are picked up from the same file by the client build.

Variable Required? Purpose
HER301_DATABASE_URL (or DATABASE_URL) yes Postgres connection string. The HER301_-prefixed form takes priority.
LLM_PROVIDER optional (default: gemini) Text-mode provider. One of gemini, deepseek, openai. Voice mode is Gemini-only regardless.
DEEPSEEK_API_KEY conditional Required if LLM_PROVIDER=deepseek.
OPENAI_API_KEY conditional Required if LLM_PROVIDER=openai.
GOOGLE_API_KEY (or GEMINI_API_KEY) required for voice Gemini Live is the only voice path. Also used as text provider when LLM_PROVIDER=gemini.
ZEP_API_KEY optional but recommended Long-term memory. All Zep operations gracefully no-op when absent.
LIVE_MODEL optional Override the Gemini Live model. Default: gemini-3.1-flash-live-preview. Set to gemini-3.1-pro-live-preview for higher reasoning quality at ~10× the audio cost.
ACTIVE_PERSONA optional (default: theodore) One of the slugs registered in server/persona/registry.ts.
HER301_LIVE_MAX_PER_DAY optional (default: 10) Per-user daily quota for /api/live/token.
HER301_LIVE_COOLDOWN_SEC optional (default: 30) Min seconds between Live token mints per user.
HER301_EMAIL_DRY_RUN optional Set to 1 to make composeEmailDraft/sendEmailDraft log instead of touching Gmail. Tests/CI rely on this.
TAVILY_API_KEY optional Enables webSearch tool.
SERPER_API_KEY optional Enables serperSearch and getNews tools.
WEATHER_API_KEY (WeatherAPI.com) optional Enables getWeather tool.
GOOGLE_OAUTH_CLIENT_ID optional Enables Calendar / Gmail / Tasks / Contacts tools.
GOOGLE_OAUTH_CLIENT_SECRET optional Enables Calendar / Gmail / Tasks / Contacts tools.
GOOGLE_OAUTH_REFRESH_TOKEN optional Run pnpm google:setup to get one.

Cost note on LIVE_MODEL. The default is flash (cheaper). Switch to pro only when you need stronger reasoning during voice — it's roughly 10× the audio cost.

3. Database

pnpm db:push                  # generate + apply Drizzle migrations

Schema: users, threads, messages, reminders, expenses (5 tables). Source of truth in drizzle/schema.ts.

4. Auth

her301 ships unauthenticated by default — it's a single-user companion meant to run on your own infra. There's no login screen, no /api gate, no auth dependencies. Cost protection for the Live API still lives in server/auth.ts and rate-limits per request IP.

Adding auth for a multi-tenant deployment. Put any standard Express auth middleware in front of /api:

// server/_core/index.ts (your fork)
import { requireAuth } from "./your-auth";  // Firebase, Auth0, magic link, …
app.use("/api", requireAuth);

Then use the authenticated principal (instead of req.ip) as the rate-limit key in the /api/live/token handler — registerLiveSession(user.uid) works the same.

The Express server already trusts a single proxy hop (app.set("trust proxy", 1)) so req.ip resolves correctly behind a Kubernetes Ingress.

5. Optional integrations

  • Zep Cloud — set ZEP_API_KEY. Without it, server/memory/zep.ts:isConfigured() returns false and every Zep call no-ops. Conversations still persist to Postgres.
  • Tools requiring API keys — see the table above. Inactive tools are logged at startup by logActiveTools().
  • Google services — run pnpm google:setup to walk through the OAuth consent flow and get a refresh token. This single token grants Calendar, Gmail, Tasks, and Contacts access. (Note: ensure the consent screen requests the contacts.readonly scope — the contacts tool will silently return "Contacts aren't set up yet" if missing.)

Persona system

A persona in her301 is data, not code — a typed JSON object validated by PersonaSchema (in server/persona/schema.ts). The same tool registry, memory layer, and Genkit flow produce a completely different companion depending on which persona is loaded.

ACTIVE_PERSONA=theodore   # default — canonical "Her" reference (English, LA)
ACTIVE_PERSONA=marcus     # Berlin senior backend engineer, post-layoff (German)

Each persona declares:

  • companionName, userName, language, optional voiceLanguageHint
  • personality — bullet points the LLM internalizes
  • userContext — job, location, relationships, habits, emotional state, timezone
  • toolGuidance — per-tool tone hint (e.g. "use the calendar to surface interview slots, not to organize his week")
  • voiceStyle — voice-mode-specific tone overrides
  • intro — first-line behaviour

Each persona also gets isolated memory: me-${slug} Zep userId and a stable users.id row in Postgres. Switching ACTIVE_PERSONA cannot leak one persona's memory into another's.

Adding a persona

  1. Create server/persona/<slug>.ts exporting a Persona (see theodore.ts / marcus.ts).
  2. Register it in server/persona/registry.ts: extend the PersonaSlug type and add an entry to REGISTRY mapping the slug to { persona, dbUserId }. Pick a fresh dbUserId.
  3. (Optional) Generate seed data via the simulation engine: pnpm simulate:persona --persona=<slug> --instruction="…".
  4. Push seed data to live services: pnpm seed:persona --persona=<slug> --from-json=scripts/simulate/output/<slug>.json.

The simulation engine itself is a five-stage Genkit pipeline (world → calendar → gmail → reminders → zep) under scripts/simulate/. Each stage feeds the next so cross-references stay coherent (recruiter emails reference real interviews on the calendar, reminders point at calendar events, Zep convos echo the inbox). Override the model with SIMULATE_MODEL=googleai/gemini-3-flash-preview for cheaper runs.


Voice mode (Gemini Live)

The voice path is Gemini Live only. OpenAI Realtime is a different protocol; Anthropic and DeepSeek have no realtime audio offering. The client opens a raw WebSocket against wss://generativelanguage.googleapis.com/... using the API key minted by /api/live/token.

Flow

  1. Client → POST /api/live/token (Bearer token required, rate-limited per user).
  2. Server validates auth, mints credentials, fetches Zep context for the active thread, builds a voice-mode system prompt via buildVoicePrompt(persona, context), and returns { token, wsUrl, useConfigFormat, model, tools, systemPrompt, sessionsRemaining }. The API key is embedded in wsUrl (and duplicated as token) — see the security note below.
  3. Client opens the WebSocket and sends { setup: { model, generationConfig, systemInstruction, tools, sessionResumption, realtimeInputConfig } }.
  4. AudioWorklet captures mic at 16 kHz mono PCM, base64-encodes each chunk, and sends it as { realtimeInput: { audio: { data, mimeType: "audio/pcm" } } }.
  5. The server's audio responses (24 kHz PCM) are scheduled into an AudioContext for low-latency playback.
  6. On turnComplete, the client POSTs the transcribed turn to /api/live/turn, which persists it to both Postgres and Zep.

Hard-won implementation notes

These all live in client/src/hooks/useGeminiLive.ts and server/live/token.ts. Each one cost an evening:

  • Setup format: raw WebSocket uses { setup: { … } }, not { config: { … } }. The latter is SDK-only and gets you a close code 1007 with no useful error message.
  • Browser Blob parsing: onmessage data arrives as a Blob in browsers. Must await event.data.text() before JSON.parse.
  • getUserMedia gesture: must be called from inside a click handler. Calling it from a WebSocket open callback silently fails.
  • Module-level WebSocket state: the WebSocket and AudioContext live as module-level vars (not React refs) to avoid stale-closure bugs across re-renders. Tradeoff: HMR requires a full page reload.
  • Test with Node first: when adding a new message type, exercise the WebSocket from tsx + the ws package before touching the browser. Server error messages are 100× clearer than browser ones.

Cost guardrails

/api/live/token is rate-limited per authenticated user via server/auth.ts:registerLiveSession:

  • max sessions per rolling 24h: HER301_LIVE_MAX_PER_DAY (default 10)
  • min seconds between mints: HER301_LIVE_COOLDOWN_SEC (default 30)

Quota exhaustion returns 429 with a Retry-After header and a JSON body containing reason (cooldown | daily_quota) and retryAfterSec.

Security note. The browser only ever sees an ephemeral token — single-use, 30 minutes, minted server-side via GoogleGenAI.authTokens.create. The project Gemini API key never leaves the her301 pod. Combined with the per-IP rate limit, this is the recommended posture for both single-user self-hosting and a public hosted demo.


Tools

her301 ships with 19 tool functions across 9 categories. Each tool is enabled in server/tools/config.ts and gated on an env key — missing keys disable the tool gracefully, logged at startup.

Category Tool IDs Required env
Web search webSearch (Tavily), serperSearch TAVILY_API_KEY / SERPER_API_KEY
Calendar getCalendar, createEvent, deleteEvent, updateEvent GOOGLE_OAUTH_REFRESH_TOKEN
Email (read) searchInbox, readEmail GOOGLE_OAUTH_REFRESH_TOKEN (+ gmail.readonly)
Email (send) composeEmailDraft, sendEmailDraft GOOGLE_OAUTH_REFRESH_TOKEN (+ gmail.compose)
Tasks getTasks, addTask, completeTask GOOGLE_OAUTH_REFRESH_TOKEN
Contacts searchContacts GOOGLE_OAUTH_REFRESH_TOKEN (+ contacts.readonly)
Notes createNote, listRecentNotes, searchNotes, deleteNote (Postgres only — uses FTS via to_tsvector)
Reminders setReminder, getReminders (Postgres only)
Expenses logExpense, getExpenseSummary, getRecentExpenses (Postgres only)
Weather getWeather WEATHER_API_KEY (WeatherAPI.com)
News getNews SERPER_API_KEY

Email-send safety pattern. composeEmailDraft always creates a Gmail draft first and returns a preview; sendEmailDraft actually sends it. The persona prompt instructs the companion to read the draft back to the user and only send after explicit verbal confirmation. Set HER301_EMAIL_DRY_RUN=1 to make both tools log-only — used in tests/CI so they never accidentally send mail. To enable email sending, run pnpm google:setup to mint a refresh token with the gmail.compose scope.

Tools are dual-resolved: resolveTools() returns Genkit ToolAction[] for the text companion flow; resolveToolDeclarations() returns plain JSON declarations for the Live API. Adding a new tool means three updates: the tool file in server/tools/, an entry in TOOL_MAP + LIVE_DECLARATIONS_MAP + TOOL_EXECUTOR_MAP (server/tools/registry.ts), and a ToolConfig entry in server/tools/config.ts.


Resetting state during testing

Operations are listed from lightest to most invasive. Stop at the cheapest one that solves your problem.

1. Stuck voice state, audio not playing, "Not connected to Samantha" after a fix. Hard-reload the browser (Cmd+Shift+R / Ctrl+Shift+R). Module-level session, audioCtx, and mediaStream in useGeminiLive.ts survive HMR.

2. Server-side change (env edit, persona switch, code edit that touches startup).

kill $(lsof -ti :3000)
pnpm dev

3. Switch personas. Set ACTIVE_PERSONA=marcus or ACTIVE_PERSONA=theodore in your .env, then restart the dev server. Personas have isolated Zep userIds (me-${slug}) and Postgres users.id rows — switching doesn't leak state.

4. Reset the Live API rate limiter. The 10-per-day cap and 30s cooldown live in an in-memory Map (server/auth.ts:liveSessionLog). A dev-server restart clears it. There's no separate reset command. For tests, import _resetRateLimitState from ./auth.

5. Regenerate seed data (no live writes).

The simulation engine chains five Gemini-driven flows: world → calendar → gmail → reminders → zep. Each flow consumes the prior layers' output, so cross-references stay coherent (recruiter emails reference real interviews on the calendar, reminders point at calendar events, Zep conversations echo the inbox).

--instruction is appended to every one of those five prompts as Extra steering: <text>. The persona file does the heavy lifting (relationships, habits, tone, tool guidance — that's all in marcus.ts / theodore.ts). --instruction is additive flavor for this run — the current week's mood, what tensions to surface, what's on his mind. Short generic instructions like "this week" won't actually steer anything; rich, specific ones will.

# Marcus example — names a tension to surface across all five layers
pnpm simulate:persona --persona=marcus --instruction="\
Mid-week, mid-grind. Marcus has two active interview loops (a Berlin Series-B \
fintech and a remote-first scale-up) plus one freelance client. A system-design \
rejection arrived last Friday and he hasn't told Lena. Surface this tension: \
interview prep on the calendar, recruiter follow-ups + one polite 'went another \
direction' email in gmail, LeetCode plus the unmade Hamburg call in reminders, \
one zep conversation where Samantha gently notices he's avoiding the rejection."

# Theodore example
pnpm simulate:persona --persona=theodore --instruction="\
Quiet melancholy week. Theodore is editing letters for grandparents writing to \
their granddaughter who's away at college. The Catherine divorce paperwork \
arrived Tuesday. Surface across: a hesitant therapist appointment on the calendar, \
a paralegal email about signing dates, a reminder about the response he keeps \
deferring, and a zep convo where Samantha notices he's been quieter."

Output goes to scripts/simulate/output/<slug>.json. Cost: a few cents in Gemini Pro calls. Inspect the JSON (calendar event titles, email subjects, Zep convos) before pushing it to live services.

6. Wipe + reseed live services (Postgres reminders, Google Calendar, Gmail labels).

pnpm seed:clean --persona=marcus
pnpm seed:persona --persona=marcus \
  --from-json=scripts/simulate/output/marcus.json

Side effects:

  • Postgres: drops Marcus's reminders rows
  • Google Calendar: deletes events tagged SEEDED (from your real Google account)
  • Gmail: removes the SEEDED label from messages

⚠️ Does NOT wipe Zep memory. cleanZep() in scripts/seed-persona.ts currently logs a warning and leaves the thread in place. Use step 7 if you need a clean Zep slate.

7. Wipe Zep memory (manual workaround). Until cleanZep() is wired to call client.user.delete():

  • Open https://app.getzep.com
  • Find the user (me-marcus or me-theodore)
  • Delete the user — this cascades to all their threads
  • Then re-run pnpm seed:persona … to repopulate

8. Reset the database (nuclear).

pnpm db:push --force

Wipes users, threads, messages, reminders, expenses for all personas. After: restart pnpm dev (auto-runs seedDefaultUser), then re-seed each persona.


Tutorial

The 8-chapter tutorial in tutorial/ walks through the full implementation:

  1. WebSocket protocol and the setup vs config gotcha
  2. Audio capture (AudioWorklet, 16 kHz PCM) and playback (24 kHz scheduling)
  3. Tool architecture (config → registry → dual resolution)
  4. Persona system (data, not code)
  5. Memory and context (Zep + Postgres dual persistence)
  6. Tool round-trip (Live API tool calls)
  7. Failure modes (close codes, mic gestures, stale closures)
  8. Putting it all together

It's its own Deno project (tutorial/deno.json) with runnable examples in tutorial/examples/.


Deployment (Kubernetes)

A reference Kubernetes manifest is shipped under infrastructure/. The image lives in whatever container registry you choose (Artifact Registry, Docker Hub, GHCR…). Highlights of the reference deployment:

  • Container image: <your-registry>/her301:latest — build via docker build -t <tag> . and push to your own registry.
  • Port: 3000 (matches the dev server port).
  • Env injected from a ConfigMap + per-key Secret (see .env.example for the variable list).
  • Ingress wired with cert-manager + Let's Encrypt for https://. Update the host to your domain.

The author's reference deployment runs at her301.targeting.ai. Yours can run anywhere a Node 20+ container can run.

CI: .github/workflows/build-and-push.yml builds and pushes the image only when paths under her301/ change. Auto-PR to main for feature/** branches.


Tests

pnpm test                       # full suite (Vitest)
pnpm test server/auth.test.ts   # rate limiter
pnpm test server/tools/         # tool tests

A pre-existing failure in server/tools/expenses.test.ts (timezone offset) currently fails on main. Unrelated to launch but worth a follow-up before CI gates merges.


Contributing

PRs welcome. Three things to keep in mind:

  1. TypeScript must pass: pnpm check is the gate. No any, no as unknown as shortcuts in PRs.
  2. Tests for behavior changes: especially anything touching auth.ts, tool registry, or persona schema. The pattern is functional + Result-style: tools return string (success message) on success, never throw to the LLM.
  3. Personas are clean-room. New personas must use fictional names — no real people, no real companies, no defamation surface.

License

MIT — see LICENSE. Permissive: do whatever you like, just keep the copyright notice.


Acknowledgements

The persona of Samantha and the cinematic reference are drawn from Spike Jonze's Her (2013), used as cultural inspiration only. No assets, scripts, or trademarks of Warner Bros. are included in this repository. The default companion name is the common given name "Samantha"; the project name is her301.

About

An open-source voice AI companion. Real-time speech via Gemini Live, persistent memory via Zep, persona-as-data, and a tool registry that actually does things. Single-user by default; bring your own auth for multi-tenant.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors