FastAPI LangGraph Agent Template

A production-ready template for building AI agent backends with FastAPI and LangGraph. Handles the hard parts — stateful conversations, long-term memory, tool calling, observability, rate limiting, auth — so you can focus on your agent logic.

Built for AI engineers who want a solid foundation, not a tutorial project.

What's included

LangGraph stateful agent with checkpointing, tool calling, and human-in-the-loop support
Long-term memory via mem0 + pgvector — semantic search per user, cache-backed
LLM service with circular model fallback, exponential backoff retries, and total timeout budget
Langfuse tracing on all LLM calls; Prometheus metrics + Grafana dashboards
JWT auth with session management; rate limiting via slowapi
Alembic migrations; optional Valkey/Redis cache layer
Structured logging with request/session/user context on every line

Quickstart

bash
git clone <repo-url> my-agent && cd my-agent
cp .env.example .env.development   # fill in your keys
make install
make docker-up                     # starts API + PostgreSQL

Open http://localhost:8000/docs to see the interactive API.

For local development without Docker see docs/getting-started.md.

Documentation

Guide	What it covers
Getting Started	Prerequisites, local setup, first API call
Architecture	System design, request flow, component diagrams
Configuration	All environment variables with defaults
Authentication	JWT flow, sessions, endpoint reference
Database & Migrations	Schema, Alembic migrations, pgvector
LLM Service	Models, retries, fallback, timeout budget
Memory	mem0 long-term memory, cache layer
Observability	Langfuse, structured logging, Prometheus, profiling
Evaluation	Eval framework, custom metrics, reports
Docker	Docker, Compose, full monitoring stack

Project structure

app/
  api/v1/          # Route handlers
  core/
    langgraph/     # Agent graph + tools
    prompts/       # System prompt template
    cache.py       # Valkey/Redis + in-memory fallback
    config.py      # Settings
    middleware.py  # Metrics, logging context, profiling
    limiter.py     # Rate limiting
  models/          # SQLModel ORM models
  schemas/         # Pydantic request/response schemas
  services/        # LLM, database, memory services
alembic/           # Database migrations
evals/             # LLM evaluation framework

Contributing

PRs welcome. Please read docs/getting-started.md to get your environment set up, then follow the coding conventions in AGENTS.md.

Report security issues privately — see SECURITY.md.

License

See LICENSE.

FAQ

General

What is this template?
A production-ready foundation for AI agent backends built on FastAPI + LangGraph. It bundles the components you'd otherwise wire up by hand: stateful conversations, long-term memory, tool calling, observability, rate limiting, and JWT auth.

How does this differ from a basic LangGraph setup?
The base LangGraph quickstart stops at "agent runs locally". This template adds Alembic migrations, mem0 + pgvector long-term memory, Langfuse tracing, Prometheus + Grafana dashboards, JWT sessions, slowapi rate limiting, structured logging with per-request context, and a circular-fallback LLM service — production concerns you'd otherwise build separately.

Setup & Configuration

Do I need Docker?
Recommended but not required. make docker-up starts the API + PostgreSQL together. For local-only setup see docs/getting-started.md.

Which LLM providers are supported?
Today: OpenAI only via the LLMRegistry in app/services/llm/registry.py. Multi-provider support (Anthropic, Google, OpenRouter) via LangChain's init_chat_model is planned — see #51. Configure your model via DEFAULT_LLM_MODEL in .env.development.

How do I configure long-term memory?
Long-term memory is self-hosted: mem0 runs in-process and persists into your existing PostgreSQL via pgvector — there is no separate mem0 cloud account or API key. You only need a working OPENAI_API_KEY (used for fact extraction + embeddings) and the pgvector extension enabled. See docs/memory.md for details.

Development

How do I add a custom tool?
Drop a LangChain @tool-decorated function in app/core/langgraph/tools/ and register it in the tools list exported from that package. The agent picks it up on next start; no graph changes needed.

How does the LLM service handle failures?
Two layers: (1) per-call exponential-backoff retry via tenacity, (2) circular fallback — if the active model exhausts its retries, the service rotates to the next model in LLMRegistry and continues. A total timeout budget caps the whole call so latency stays bounded. See docs/llm-service.md.

Can I use this without Langfuse?
Yes. Set LANGFUSE_TRACING_ENABLED=false (or omit the Langfuse keys). The agent runs unchanged; structured logs still capture request/session/user context.

Troubleshooting

The API won't start

Ensure PostgreSQL is running (make docker-up brings it up alongside the API)
Confirm .env.development exists — copy from .env.example and fill in required keys
Apply migrations: make migrate

Memory / semantic search returns nothing

Verify the pgvector extension is enabled in your PostgreSQL instance
Confirm OPENAI_API_KEY is valid (mem0 calls OpenAI for fact extraction + embeddings)
Check LONG_TERM_MEMORY_MODEL and LONG_TERM_MEMORY_EMBEDDER_MODEL are set in .env.development

Rate limiting is too aggressive
Limits are defined in app/core/limiter.py (slowapi). Adjust per-route decorators or the default rate in that file. See docs/configuration.md for the related env vars.