Gitpedia

Fastapi langgraph agent production ready template

A production-ready FastAPI template for building AI agent applications with LangGraph integration. This template provides a robust foundation for building scalable, secure, and maintainable AI agent services.

From wassim249·Updated June 1, 2026·View on GitHub·

A production-ready template for building AI agent backends with FastAPI and LangGraph. Handles the hard parts — stateful conversations, long-term memory, tool calling, observability, rate limiting, auth — so you can focus on your agent logic. The project is written primarily in Python, distributed under the MIT License license, first published in 2025. It has gained significant community traction with 2,339 stars and 544 forks on GitHub. Key topics include: agent, agentic-ai, docker, fastapi, fastapi-template.

FastAPI LangGraph Agent Template

A production-ready template for building AI agent backends with FastAPI and LangGraph. Handles the hard parts — stateful conversations, long-term memory, tool calling, observability, rate limiting, auth — so you can focus on your agent logic.

Built for AI engineers who want a solid foundation, not a tutorial project.

What's included

  • LangGraph stateful agent with checkpointing, tool calling, and human-in-the-loop support
  • Long-term memory via mem0 + pgvector — semantic search per user, cache-backed
  • LLM service with circular model fallback, exponential backoff retries, and total timeout budget
  • Langfuse tracing on all LLM calls; Prometheus metrics + Grafana dashboards
  • JWT auth with session management; rate limiting via slowapi
  • Alembic migrations; optional Valkey/Redis cache layer
  • Structured logging with request/session/user context on every line

Quickstart

bash
git clone <repo-url> my-agent && cd my-agent cp .env.example .env.development # fill in your keys make install make docker-up # starts API + PostgreSQL

Open http://localhost:8000/docs to see the interactive API.

For local development without Docker see docs/getting-started.md.

Documentation

GuideWhat it covers
Getting StartedPrerequisites, local setup, first API call
ArchitectureSystem design, request flow, component diagrams
ConfigurationAll environment variables with defaults
AuthenticationJWT flow, sessions, endpoint reference
Database & MigrationsSchema, Alembic migrations, pgvector
LLM ServiceModels, retries, fallback, timeout budget
Memorymem0 long-term memory, cache layer
ObservabilityLangfuse, structured logging, Prometheus, profiling
EvaluationEval framework, custom metrics, reports
DockerDocker, Compose, full monitoring stack

Project structure

app/
  api/v1/          # Route handlers
  core/
    langgraph/     # Agent graph + tools
    prompts/       # System prompt template
    cache.py       # Valkey/Redis + in-memory fallback
    config.py      # Settings
    middleware.py  # Metrics, logging context, profiling
    limiter.py     # Rate limiting
  models/          # SQLModel ORM models
  schemas/         # Pydantic request/response schemas
  services/        # LLM, database, memory services
alembic/           # Database migrations
evals/             # LLM evaluation framework

Contributing

PRs welcome. Please read docs/getting-started.md to get your environment set up, then follow the coding conventions in AGENTS.md.

Report security issues privately — see SECURITY.md.

License

See LICENSE.

FAQ

General

What is this template?
A production-ready foundation for AI agent backends built on FastAPI + LangGraph. It bundles the components you'd otherwise wire up by hand: stateful conversations, long-term memory, tool calling, observability, rate limiting, and JWT auth.

How does this differ from a basic LangGraph setup?
The base LangGraph quickstart stops at "agent runs locally". This template adds Alembic migrations, mem0 + pgvector long-term memory, Langfuse tracing, Prometheus + Grafana dashboards, JWT sessions, slowapi rate limiting, structured logging with per-request context, and a circular-fallback LLM service — production concerns you'd otherwise build separately.

Setup & Configuration

Do I need Docker?
Recommended but not required. make docker-up starts the API + PostgreSQL together. For local-only setup see docs/getting-started.md.

Which LLM providers are supported?
Today: OpenAI only via the LLMRegistry in app/services/llm/registry.py. Multi-provider support (Anthropic, Google, OpenRouter) via LangChain's init_chat_model is planned — see #51. Configure your model via DEFAULT_LLM_MODEL in .env.development.

How do I configure long-term memory?
Long-term memory is self-hosted: mem0 runs in-process and persists into your existing PostgreSQL via pgvector — there is no separate mem0 cloud account or API key. You only need a working OPENAI_API_KEY (used for fact extraction + embeddings) and the pgvector extension enabled. See docs/memory.md for details.

Development

How do I add a custom tool?
Drop a LangChain @tool-decorated function in app/core/langgraph/tools/ and register it in the tools list exported from that package. The agent picks it up on next start; no graph changes needed.

How does the LLM service handle failures?
Two layers: (1) per-call exponential-backoff retry via tenacity, (2) circular fallback — if the active model exhausts its retries, the service rotates to the next model in LLMRegistry and continues. A total timeout budget caps the whole call so latency stays bounded. See docs/llm-service.md.

Can I use this without Langfuse?
Yes. Set LANGFUSE_TRACING_ENABLED=false (or omit the Langfuse keys). The agent runs unchanged; structured logs still capture request/session/user context.

Troubleshooting

The API won't start

  • Ensure PostgreSQL is running (make docker-up brings it up alongside the API)
  • Confirm .env.development exists — copy from .env.example and fill in required keys
  • Apply migrations: make migrate

Memory / semantic search returns nothing

  • Verify the pgvector extension is enabled in your PostgreSQL instance
  • Confirm OPENAI_API_KEY is valid (mem0 calls OpenAI for fact extraction + embeddings)
  • Check LONG_TERM_MEMORY_MODEL and LONG_TERM_MEMORY_EMBEDDER_MODEL are set in .env.development

Rate limiting is too aggressive
Limits are defined in app/core/limiter.py (slowapi). Adjust per-route decorators or the default rate in that file. See docs/configuration.md for the related env vars.

Contributors

Showing top 9 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from wassim249/fastapi-langgraph-agent-production-ready-template via the GitHub API.Last fetched: 6/1/2026