| title | Polymath |
|---|---|
| emoji | π§ |
| colorFrom | blue |
| colorTo | indigo |
| sdk | docker |
| app_port | 8501 |
| pinned | false |
Give Polymath a research topic; it returns a sourced markdown report with inline citations and an auto-generated PowerPoint deck. Specialized agents (Planner, Search, Reader, Critic, Writer) are coordinated by a LangGraph state machine, share a Chroma vector store as working memory, and call their tools over MCP.
π₯ _Demo video: https://github.com/user-attachments/assets/6b2df806-fd6f-4f33-bd12-21657388c140
topic
β
βΌ
βββββββββββ decomposes topic β subtasks
β Planner β
ββββββ¬βββββ
βΌ
βββββββββββ web_search (via MCP) β URLs
β Search β
ββββββ¬βββββ
βΌ
βββββββββββ page_fetch (via MCP) β trafilatura β Reader extracts Claims
β Reader β βββββββββββββββΊ Chroma vector store (working memory)
ββββββ¬βββββ β
βΌ β all claims so far
βββββββββββ βββββββββββββββββββββββββ
β Critic β gaps/contradictions? continue (new subtasks) βββ loop β€ 3Γ
ββββββ¬βββββ stop β
β ββββββββββββββββββββββββββββββββββββββββββ back to Search
βΌ
βββββββββββ synthesizes from Claims
β Writer β
ββββββ¬βββββ
ββββββ΄ββββββββββββ
βΌ βΌ
report.md deck.pptx
Tools (web_search, page_fetch, claim_extract) are exposed by a local MCP server
(mcp_server/server.py) and called through an MCP client over stdio.
| Phase | What | Acceptance |
|---|---|---|
| 1 | Single-agent baseline (LLM + tools β cited summary) | β₯5 cited claims, no hallucinated URLs |
| 2 | Structured claim extraction (Reader + Pydantic, retry) | 100% first-try valid / 5 topics |
| 3 | Chroma memory + Critic continue/stop loop | omitted aspect found 5/5 cases |
| 4 | LangGraph orchestration (Plannerβ¦Writer, conditional edges) | runs end-to-end w/ per-node trace |
| 5 | PPTX deck + tools over MCP | workflow over MCP, valid PPTX |
| 6 | Streamlit UI + deploy | enter topic β download both artifacts |
Python 3.11+ Β· uv Β· LangGraph Β· OpenRouter (free-tier models, routed in
models/router.py) Β· Pydantic v2 Β· Chroma (+ ONNX all-MiniLM-L6-v2) Β· Tavily Β·
trafilatura Β· MCP Β· python-pptx Β· Streamlit Β· pytest.
uv sync # create .venv and install deps
cp .env.example .env # fill in OPENROUTER_API_KEY and TAVILY_API_KEY
# Web app (recommended):
uv run streamlit run app.py
# Or the full pipeline from the CLI (search+fetch over MCP) β .md + .pptx in outputs/:
uv run python -m polymath.graph.workflow --topic "current state of solid-state batteries"Get free keys at openrouter.ai and tavily.com.
uv run pytest # full suite (offline; no API keys needed)Acceptance evals (these make live API calls):
uv run python eval/run_eval.py --topics 5 --pages 2 # Week 2: extraction validity
uv run python eval/run_critic_eval.py # Week 3: Critic gap detectionThis repo doubles as a Docker Space (config is the YAML frontmatter at the top of
this file; the build uses the Dockerfile, which runs the Streamlit app). To deploy:
- Create a new Space on Hugging Face β SDK Docker β Streamlit template β CPU Basic (Free).
- Push this repo to the Space remote (
git push hf main --force). - In Settings β Variables and secrets, add
OPENROUTER_API_KEYandTAVILY_API_KEYas secrets. - The Space builds from the
Dockerfile(installsrequirements.txt). The first run downloads the ~80 MB ONNX embedding model once (cached afterward).
The app reads keys from environment variables (via config.py), so the secrets are
picked up automatically.
uv run python scripts/week1_baseline.py "..." # Week 1
uv run python scripts/week2_reader.py "..." --pages 3 # Week 2
uv run python scripts/week3_research.py "..." --max-iterations 3 # Week 3See PROJECT_SPEC.md for the full design.