An open-source voice AI companion. Real-time speech via Gemini Live, persistent memory via Zep, persona-as-data, and a tool registry that actually does things.
The voice companion the movie Her promised — open source, self-hostable, and built around the four technologies that finally made it possible: Gemini Live for sub-second audio, Zep for memory across sessions, Genkit for tool calling, and a persona-as-data architecture that turns the same tool registry into completely different companions.
What it is: a voice-first AI companion you can run yourself, with a real persona, real memory, and tools that affect the real world.
What it isn't: a voice agent framework (use Pipecat / Vapi / Retell / LiveKit), a closed companion product (Replika / Pi / ChatGPT Voice), or a black-box SDK wrapper that hides the protocol. The repo ships both Gemini Live paths — the production version on Google's @google/genai SDK with ephemeral auth tokens, and the original raw-WebSocket implementation kept in the tutorial as a byte-level teaching reference.
A live Marcus persona is hosted at her301.targeting.ai.
Want to see her use her tools on something you sent? Send a short email or calendar invite to nnfuzzy.her@gmail.com — it's the test inbox the demo reads from. The next time someone runs the demo, Samantha may surface your message in the Gmail / calendar tools. Polite use, please — it's a real Gmail account and bounces aren't fun.
pnpm install
cp .env.example .env # fill in keys (see "Required keys" below)
pnpm db:push # create Postgres tables
pnpm dev # http://localhost:3000Minimum viable run (text mode only, default theodore persona, no tools that need OAuth):
GOOGLE_API_KEY=… # Gemini key (also covers voice if you flip on)
HER301_DATABASE_URL=postgresql://…Full voice mode additionally requires:
GOOGLE_API_KEY=… # Gemini Live is Google-only
ZEP_API_KEY=… # optional but strongly recommendedVoice in/out via Gemini Live, text fallback via DeepSeek/OpenAI/Gemini (configurable). Voice mode is Gemini-only — there is no provider swap for the realtime audio path.
┌────────────┐ tRPC ┌──────────────────┐ Genkit flow ┌──────────────┐
│ React UI │───────────▶ │ Express server │────────────────▶│ Companion │
│ (Vite) │ │ (no auth gate) │ │ (text mode) │
└─────┬──────┘ └────────┬─────────┘ └──────┬───────┘
│ │ │
│ WebSocket via SDK │ /api/live/token (rate-limited) │
▼ ▼ ▼
┌────────────────────┐ ┌────────────────┐ ┌──────────────┐
│ Gemini Live API │ │ PostgreSQL │ │ Tool │
│ (audio in/out) │ │ (Drizzle ORM) │◀──────────────│ registry │
└────────────────────┘ └────────────────┘ └──────┬───────┘
│
┌───────▼──────┐
│ Zep Cloud │
│ (memory) │
└──────────────┘
Stack: React 19 + Vite (client) · Express + tRPC 11 (server) · Drizzle ORM + PostgreSQL · Genkit + multi-provider LLMs (text) · Gemini Live raw WebSocket (voice) · Zep Cloud (long-term memory) · Tailwind 4.
Key paths:
server/_core/index.ts— Express bootstrap + auth middleware + Live API endpointsserver/routers.ts— tRPC routes (chat, threads, sidebar, expenses)server/genkit/{ai,companion}.ts— Genkit instance (multi-provider) + companion flowserver/live/token.ts— Gemini Live token + system-prompt injection + Zep contextserver/tools/{config,registry}.ts— tool config and dual-resolution (Genkit + Live)server/persona/{registry,schema,theodore,marcus}.ts— persona-as-dataserver/memory/zep.ts— Zep Cloud wrapper (gracefully no-ops whenZEP_API_KEYabsent)server/auth.ts— Live API per-caller rate limiter (in-memory, IP-keyed)client/src/hooks/useGeminiLive.ts— WebSocket client, AudioWorklet capture, PCM playbackclient/src/components/VoiceOverlay.tsx— voice UI
- Node 20+
- pnpm 9+
- PostgreSQL 14+
- (no auth dependency) — her301 ships unauthenticated; bring your own provider for multi-tenant deployments
All env vars are read by server/_core/env.ts. Vite-prefixed vars (none required by default) are picked up from the same file by the client build.
| Variable | Required? | Purpose |
|---|---|---|
HER301_DATABASE_URL (or DATABASE_URL) |
yes | Postgres connection string. The HER301_-prefixed form takes priority. |
LLM_PROVIDER |
optional (default: gemini) |
Text-mode provider. One of gemini, deepseek, openai. Voice mode is Gemini-only regardless. |
DEEPSEEK_API_KEY |
conditional | Required if LLM_PROVIDER=deepseek. |
OPENAI_API_KEY |
conditional | Required if LLM_PROVIDER=openai. |
GOOGLE_API_KEY (or GEMINI_API_KEY) |
required for voice | Gemini Live is the only voice path. Also used as text provider when LLM_PROVIDER=gemini. |
ZEP_API_KEY |
optional but recommended | Long-term memory. All Zep operations gracefully no-op when absent. |
LIVE_MODEL |
optional | Override the Gemini Live model. Default: gemini-3.1-flash-live-preview. Set to gemini-3.1-pro-live-preview for higher reasoning quality at ~10× the audio cost. |
ACTIVE_PERSONA |
optional (default: theodore) |
One of the slugs registered in server/persona/registry.ts. |
HER301_LIVE_MAX_PER_DAY |
optional (default: 10) | Per-user daily quota for /api/live/token. |
HER301_LIVE_COOLDOWN_SEC |
optional (default: 30) | Min seconds between Live token mints per user. |
HER301_EMAIL_DRY_RUN |
optional | Set to 1 to make composeEmailDraft/sendEmailDraft log instead of touching Gmail. Tests/CI rely on this. |
TAVILY_API_KEY |
optional | Enables webSearch tool. |
SERPER_API_KEY |
optional | Enables serperSearch and getNews tools. |
WEATHER_API_KEY (WeatherAPI.com) |
optional | Enables getWeather tool. |
GOOGLE_OAUTH_CLIENT_ID |
optional | Enables Calendar / Gmail / Tasks / Contacts tools. |
GOOGLE_OAUTH_CLIENT_SECRET |
optional | Enables Calendar / Gmail / Tasks / Contacts tools. |
GOOGLE_OAUTH_REFRESH_TOKEN |
optional | Run pnpm google:setup to get one. |
Cost note on
LIVE_MODEL. The default isflash(cheaper). Switch toproonly when you need stronger reasoning during voice — it's roughly 10× the audio cost.
pnpm db:push # generate + apply Drizzle migrationsSchema: users, threads, messages, reminders, expenses (5 tables). Source of truth in drizzle/schema.ts.
her301 ships unauthenticated by default — it's a single-user companion meant to run on your own infra. There's no login screen, no /api gate, no auth dependencies. Cost protection for the Live API still lives in server/auth.ts and rate-limits per request IP.
Adding auth for a multi-tenant deployment. Put any standard Express auth middleware in front of /api:
// server/_core/index.ts (your fork)
import { requireAuth } from "./your-auth"; // Firebase, Auth0, magic link, …
app.use("/api", requireAuth);Then use the authenticated principal (instead of req.ip) as the rate-limit key in the /api/live/token handler — registerLiveSession(user.uid) works the same.
The Express server already trusts a single proxy hop (app.set("trust proxy", 1)) so req.ip resolves correctly behind a Kubernetes Ingress.
- Zep Cloud — set
ZEP_API_KEY. Without it,server/memory/zep.ts:isConfigured()returns false and every Zep call no-ops. Conversations still persist to Postgres. - Tools requiring API keys — see the table above. Inactive tools are logged at startup by
logActiveTools(). - Google services — run
pnpm google:setupto walk through the OAuth consent flow and get a refresh token. This single token grants Calendar, Gmail, Tasks, and Contacts access. (Note: ensure the consent screen requests thecontacts.readonlyscope — the contacts tool will silently return "Contacts aren't set up yet" if missing.)
A persona in her301 is data, not code — a typed JSON object validated by PersonaSchema (in server/persona/schema.ts). The same tool registry, memory layer, and Genkit flow produce a completely different companion depending on which persona is loaded.
ACTIVE_PERSONA=theodore # default — canonical "Her" reference (English, LA)
ACTIVE_PERSONA=marcus # Berlin senior backend engineer, post-layoff (German)
Each persona declares:
companionName,userName,language, optionalvoiceLanguageHintpersonality— bullet points the LLM internalizesuserContext— job, location, relationships, habits, emotional state, timezonetoolGuidance— per-tool tone hint (e.g. "use the calendar to surface interview slots, not to organize his week")voiceStyle— voice-mode-specific tone overridesintro— first-line behaviour
Each persona also gets isolated memory: me-${slug} Zep userId and a stable users.id row in Postgres. Switching ACTIVE_PERSONA cannot leak one persona's memory into another's.
- Create
server/persona/<slug>.tsexporting aPersona(seetheodore.ts/marcus.ts). - Register it in
server/persona/registry.ts: extend thePersonaSlugtype and add an entry toREGISTRYmapping the slug to{ persona, dbUserId }. Pick a freshdbUserId. - (Optional) Generate seed data via the simulation engine:
pnpm simulate:persona --persona=<slug> --instruction="…". - Push seed data to live services:
pnpm seed:persona --persona=<slug> --from-json=scripts/simulate/output/<slug>.json.
The simulation engine itself is a five-stage Genkit pipeline (world → calendar → gmail → reminders → zep) under scripts/simulate/. Each stage feeds the next so cross-references stay coherent (recruiter emails reference real interviews on the calendar, reminders point at calendar events, Zep convos echo the inbox). Override the model with SIMULATE_MODEL=googleai/gemini-3-flash-preview for cheaper runs.
The voice path is Gemini Live only. OpenAI Realtime is a different protocol; Anthropic and DeepSeek have no realtime audio offering. The client opens a raw WebSocket against wss://generativelanguage.googleapis.com/... using the API key minted by /api/live/token.
- Client →
POST /api/live/token(Bearer token required, rate-limited per user). - Server validates auth, mints credentials, fetches Zep context for the active thread, builds a voice-mode system prompt via
buildVoicePrompt(persona, context), and returns{ token, wsUrl, useConfigFormat, model, tools, systemPrompt, sessionsRemaining }. The API key is embedded inwsUrl(and duplicated astoken) — see the security note below. - Client opens the WebSocket and sends
{ setup: { model, generationConfig, systemInstruction, tools, sessionResumption, realtimeInputConfig } }. - AudioWorklet captures mic at 16 kHz mono PCM, base64-encodes each chunk, and sends it as
{ realtimeInput: { audio: { data, mimeType: "audio/pcm" } } }. - The server's audio responses (24 kHz PCM) are scheduled into an
AudioContextfor low-latency playback. - On
turnComplete, the clientPOSTs the transcribed turn to/api/live/turn, which persists it to both Postgres and Zep.
These all live in client/src/hooks/useGeminiLive.ts and server/live/token.ts. Each one cost an evening:
- Setup format: raw WebSocket uses
{ setup: { … } }, not{ config: { … } }. The latter is SDK-only and gets you a close code 1007 with no useful error message. - Browser Blob parsing:
onmessagedata arrives as aBlobin browsers. Mustawait event.data.text()beforeJSON.parse. - getUserMedia gesture: must be called from inside a click handler. Calling it from a WebSocket open callback silently fails.
- Module-level WebSocket state: the WebSocket and AudioContext live as module-level vars (not React refs) to avoid stale-closure bugs across re-renders. Tradeoff: HMR requires a full page reload.
- Test with Node first: when adding a new message type, exercise the WebSocket from
tsx+ thewspackage before touching the browser. Server error messages are 100× clearer than browser ones.
/api/live/token is rate-limited per authenticated user via server/auth.ts:registerLiveSession:
- max sessions per rolling 24h:
HER301_LIVE_MAX_PER_DAY(default 10) - min seconds between mints:
HER301_LIVE_COOLDOWN_SEC(default 30)
Quota exhaustion returns 429 with a Retry-After header and a JSON body containing reason (cooldown | daily_quota) and retryAfterSec.
Security note. The browser only ever sees an ephemeral token — single-use, 30 minutes, minted server-side via
GoogleGenAI.authTokens.create. The project Gemini API key never leaves the her301 pod. Combined with the per-IP rate limit, this is the recommended posture for both single-user self-hosting and a public hosted demo.
her301 ships with 19 tool functions across 9 categories. Each tool is enabled in server/tools/config.ts and gated on an env key — missing keys disable the tool gracefully, logged at startup.
| Category | Tool IDs | Required env |
|---|---|---|
| Web search | webSearch (Tavily), serperSearch |
TAVILY_API_KEY / SERPER_API_KEY |
| Calendar | getCalendar, createEvent, deleteEvent, updateEvent |
GOOGLE_OAUTH_REFRESH_TOKEN |
| Email (read) | searchInbox, readEmail |
GOOGLE_OAUTH_REFRESH_TOKEN (+ gmail.readonly) |
| Email (send) | composeEmailDraft, sendEmailDraft |
GOOGLE_OAUTH_REFRESH_TOKEN (+ gmail.compose) |
| Tasks | getTasks, addTask, completeTask |
GOOGLE_OAUTH_REFRESH_TOKEN |
| Contacts | searchContacts |
GOOGLE_OAUTH_REFRESH_TOKEN (+ contacts.readonly) |
| Notes | createNote, listRecentNotes, searchNotes, deleteNote |
(Postgres only — uses FTS via to_tsvector) |
| Reminders | setReminder, getReminders |
(Postgres only) |
| Expenses | logExpense, getExpenseSummary, getRecentExpenses |
(Postgres only) |
| Weather | getWeather |
WEATHER_API_KEY (WeatherAPI.com) |
| News | getNews |
SERPER_API_KEY |
Email-send safety pattern. composeEmailDraft always creates a Gmail draft first and returns a preview; sendEmailDraft actually sends it. The persona prompt instructs the companion to read the draft back to the user and only send after explicit verbal confirmation. Set HER301_EMAIL_DRY_RUN=1 to make both tools log-only — used in tests/CI so they never accidentally send mail. To enable email sending, run pnpm google:setup to mint a refresh token with the gmail.compose scope.
Tools are dual-resolved: resolveTools() returns Genkit ToolAction[] for the text companion flow; resolveToolDeclarations() returns plain JSON declarations for the Live API. Adding a new tool means three updates: the tool file in server/tools/, an entry in TOOL_MAP + LIVE_DECLARATIONS_MAP + TOOL_EXECUTOR_MAP (server/tools/registry.ts), and a ToolConfig entry in server/tools/config.ts.
Operations are listed from lightest to most invasive. Stop at the cheapest one that solves your problem.
1. Stuck voice state, audio not playing, "Not connected to Samantha" after a fix.
Hard-reload the browser (Cmd+Shift+R / Ctrl+Shift+R). Module-level session, audioCtx, and mediaStream in useGeminiLive.ts survive HMR.
2. Server-side change (env edit, persona switch, code edit that touches startup).
kill $(lsof -ti :3000)
pnpm dev3. Switch personas.
Set ACTIVE_PERSONA=marcus or ACTIVE_PERSONA=theodore in your .env, then restart the dev server. Personas have isolated Zep userIds (me-${slug}) and Postgres users.id rows — switching doesn't leak state.
4. Reset the Live API rate limiter.
The 10-per-day cap and 30s cooldown live in an in-memory Map (server/auth.ts:liveSessionLog). A dev-server restart clears it. There's no separate reset command. For tests, import _resetRateLimitState from ./auth.
5. Regenerate seed data (no live writes).
The simulation engine chains five Gemini-driven flows: world → calendar → gmail → reminders → zep. Each flow consumes the prior layers' output, so cross-references stay coherent (recruiter emails reference real interviews on the calendar, reminders point at calendar events, Zep conversations echo the inbox).
--instruction is appended to every one of those five prompts as Extra steering: <text>. The persona file does the heavy lifting (relationships, habits, tone, tool guidance — that's all in marcus.ts / theodore.ts). --instruction is additive flavor for this run — the current week's mood, what tensions to surface, what's on his mind. Short generic instructions like "this week" won't actually steer anything; rich, specific ones will.
# Marcus example — names a tension to surface across all five layers
pnpm simulate:persona --persona=marcus --instruction="\
Mid-week, mid-grind. Marcus has two active interview loops (a Berlin Series-B \
fintech and a remote-first scale-up) plus one freelance client. A system-design \
rejection arrived last Friday and he hasn't told Lena. Surface this tension: \
interview prep on the calendar, recruiter follow-ups + one polite 'went another \
direction' email in gmail, LeetCode plus the unmade Hamburg call in reminders, \
one zep conversation where Samantha gently notices he's avoiding the rejection."
# Theodore example
pnpm simulate:persona --persona=theodore --instruction="\
Quiet melancholy week. Theodore is editing letters for grandparents writing to \
their granddaughter who's away at college. The Catherine divorce paperwork \
arrived Tuesday. Surface across: a hesitant therapist appointment on the calendar, \
a paralegal email about signing dates, a reminder about the response he keeps \
deferring, and a zep convo where Samantha notices he's been quieter."Output goes to scripts/simulate/output/<slug>.json. Cost: a few cents in Gemini Pro calls. Inspect the JSON (calendar event titles, email subjects, Zep convos) before pushing it to live services.
6. Wipe + reseed live services (Postgres reminders, Google Calendar, Gmail labels).
pnpm seed:clean --persona=marcus
pnpm seed:persona --persona=marcus \
--from-json=scripts/simulate/output/marcus.jsonSide effects:
- Postgres: drops Marcus's
remindersrows - Google Calendar: deletes events tagged
SEEDED(from your real Google account) - Gmail: removes the
SEEDEDlabel from messages
cleanZep() in scripts/seed-persona.ts currently logs a warning and leaves the thread in place. Use step 7 if you need a clean Zep slate.
7. Wipe Zep memory (manual workaround).
Until cleanZep() is wired to call client.user.delete():
- Open https://app.getzep.com
- Find the user (
me-marcusorme-theodore) - Delete the user — this cascades to all their threads
- Then re-run
pnpm seed:persona …to repopulate
8. Reset the database (nuclear).
pnpm db:push --forceWipes users, threads, messages, reminders, expenses for all personas. After: restart pnpm dev (auto-runs seedDefaultUser), then re-seed each persona.
The 8-chapter tutorial in tutorial/ walks through the full implementation:
- WebSocket protocol and the
setupvsconfiggotcha - Audio capture (AudioWorklet, 16 kHz PCM) and playback (24 kHz scheduling)
- Tool architecture (config → registry → dual resolution)
- Persona system (data, not code)
- Memory and context (Zep + Postgres dual persistence)
- Tool round-trip (Live API tool calls)
- Failure modes (close codes, mic gestures, stale closures)
- Putting it all together
It's its own Deno project (tutorial/deno.json) with runnable examples in tutorial/examples/.
A reference Kubernetes manifest is shipped under infrastructure/. The image lives in whatever container registry you choose (Artifact Registry, Docker Hub, GHCR…). Highlights of the reference deployment:
- Container image:
<your-registry>/her301:latest— build viadocker build -t <tag> .and push to your own registry. - Port:
3000(matches the dev server port). - Env injected from a ConfigMap + per-key Secret (see
.env.examplefor the variable list). - Ingress wired with cert-manager + Let's Encrypt for
https://. Update the host to your domain.
The author's reference deployment runs at her301.targeting.ai. Yours can run anywhere a Node 20+ container can run.
CI: .github/workflows/build-and-push.yml builds and pushes the image only when paths under her301/ change. Auto-PR to main for feature/** branches.
pnpm test # full suite (Vitest)
pnpm test server/auth.test.ts # rate limiter
pnpm test server/tools/ # tool testsA pre-existing failure in
server/tools/expenses.test.ts(timezone offset) currently fails onmain. Unrelated to launch but worth a follow-up before CI gates merges.
PRs welcome. Three things to keep in mind:
- TypeScript must pass:
pnpm checkis the gate. Noany, noas unknown asshortcuts in PRs. - Tests for behavior changes: especially anything touching
auth.ts,tool registry, or persona schema. The pattern is functional + Result-style: tools returnstring(success message) on success, never throw to the LLM. - Personas are clean-room. New personas must use fictional names — no real people, no real companies, no defamation surface.
MIT — see LICENSE. Permissive: do whatever you like, just keep the copyright notice.
The persona of Samantha and the cinematic reference are drawn from Spike Jonze's Her (2013), used as cultural inspiration only. No assets, scripts, or trademarks of Warner Bros. are included in this repository. The default companion name is the common given name "Samantha"; the project name is her301.
