GitPedia

Ops codegraph tool

Code intelligence CLI — function-level dependency graph across 34 languages, 34-tool MCP server for AI agents, complexity metrics, architecture boundary enforcement, CI quality gates, git diff impact with co-change analysis, hybrid semantic search. Fully local, zero API keys required.

From optave·Updated June 28, 2026·View on GitHub·

The Problem · What It Does · Quick Start · Commands · Languages · AI Integration · How It Works · Practices · Roadmap The project is written primarily in TypeScript, distributed under the Apache License 2.0 license, first published in 2026. Key topics include: ai-agents, architecture, ci-cd, cli, code-analysis.

Latest release: dev-v3.15.1-dev.12Dev build 3.15.1-dev.12
June 28, 2026View Changelog →
<p align="center"> <img src="https://img.shields.io/badge/codegraph-dependency%20intelligence-blue?style=for-the-badge&logo=graphql&logoColor=white" alt="codegraph" /> </p> <h1 align="center">codegraph</h1> <p align="center"> <strong>Give your AI the map before it starts exploring.</strong> </p> <p align="center"> <a href="https://www.npmjs.com/package/@optave/codegraph"><img src="https://img.shields.io/npm/v/@optave/codegraph?style=flat-square&logo=npm&logoColor=white&label=npm" alt="npm version" /></a> <a href="https://github.com/optave/ops-codegraph-tool/blob/main/LICENSE"><img src="https://img.shields.io/github/license/optave/ops-codegraph-tool?style=flat-square&logo=opensourceinitiative&logoColor=white" alt="Apache-2.0 License" /></a> <a href="https://github.com/optave/ops-codegraph-tool/actions"><img src="https://img.shields.io/github/actions/workflow/status/optave/ops-codegraph-tool/codegraph-impact.yml?style=flat-square&logo=githubactions&logoColor=white&label=CI" alt="CI" /></a> <img src="https://img.shields.io/badge/node-%3E%3D22.6-339933?style=flat-square&logo=node.js&logoColor=white" alt="Node >= 22.6" /> </p> <p align="center"> <a href="#the-problem">The Problem</a> &middot; <a href="#what-codegraph-does">What It Does</a> &middot; <a href="#-quick-start">Quick Start</a> &middot; <a href="#-commands">Commands</a> &middot; <a href="#-language-support">Languages</a> &middot; <a href="#-ai-agent-integration-core">AI Integration</a> &middot; <a href="#-how-it-works">How It Works</a> &middot; <a href="#-recommended-practices">Practices</a> &middot; <a href="#-roadmap">Roadmap</a> </p>

The Problem

AI agents face an impossible trade-off. They either spend thousands of tokens reading files to understand a codebase's structure — blowing up their context window until quality degrades — or they assume how things work, and the assumptions are often wrong. Either way, things break. The larger the codebase, the worse it gets.

An agent modifies a function without knowing 9 files import it. It misreads what a helper does and builds logic on top of that misunderstanding. It leaves dead code behind after a refactor. The PR gets opened, and your reviewer — human or automated — flags the same structural issues again and again: "this breaks 14 callers," "that function already exists," "this export is now dead." If the reviewer catches it, that's multiple rounds of back-and-forth. If they don't, it can ship to production. Multiply that by every PR, every developer, every repo.

The information to prevent these issues exists — it's in the code itself. But without a structured map, agents lack the context to get it right consistently, reviewers waste cycles on preventable issues, and architecture degrades one unreviewed change at a time.

What Codegraph Does

Codegraph builds a function-level dependency graph of your entire codebase — every function, every caller, every dependency — and keeps it current with sub-second incremental rebuilds.

It parses your code with tree-sitter (native Rust or WASM), stores the graph in SQLite, and exposes it where it matters most:

  • MCP server — AI agents query the graph directly through 34 tools — one call instead of dozens of grep/find/cat invocations
  • CLI — developers and agents explore, query, and audit code from the terminal
  • CI gatescheck and manifesto commands enforce quality thresholds with exit codes
  • Programmatic API — embed codegraph in your own tools via npm install

Instead of an agent editing code without structural context and letting reviewers catch the fallout, it knows "this function has 14 callers across 9 files" before it touches anything. Dead exports, circular dependencies, and boundary violations surface during development — not during review. The result: PRs that need fewer review rounds.

Free. Open source. Fully local. Zero network calls, zero telemetry. Your code stays on your machine. When you want deeper intelligence, bring your own LLM provider — your code only goes where you choose to send it.

Three commands to a queryable graph:

bash
npm install -g @optave/codegraph cd your-project codegraph build

No config files, no Docker, no JVM, no API keys, no accounts. Point your agent at the MCP server and it has structural awareness of your codebase.

Why it matters

Without codegraphWith codegraph
Code reviewReviewers flag broken callers, dead code, and boundary violations round after roundStructural issues are caught during development — PRs pass review with fewer rounds
AI agentsModify parseConfig() without knowing 9 files import it — reviewer catches itfn-impact parseConfig shows every caller before the edit — agent fixes it proactively
AI agentsLeave dead exports and duplicate helpers behind after refactorsDead code, cycles, and duplicates surface in real time via hooks and MCP queries
AI agentsProduce code that works but doesn't fit the codebase structurecontext <name> -T returns source, deps, callers, and tests — the agent writes code that fits
CI pipelinesCatch test failures but miss structural degradationcheck --staged fails the build when blast radius or complexity thresholds are exceeded
DevelopersInherit a codebase and grep for hours to understand what calls whatcontext handleAuth -T gives the same structured view agents use
ArchitectsDraw boundary rules that erode within weeksmanifesto and boundaries enforce architecture rules on every commit

Feature comparison

<sub>Comparison last verified: June 2026. Claims verified against each repo's README/docs. Full analysis: <a href="generated/competitive/COMPETITIVE_ANALYSIS.md">COMPETITIVE_ANALYSIS.md</a></sub>

Capabilitycodegraph (this repo)code-review-graphnarsil-mcpcodegraph (other)¹axonGitNexus
GitHub stars
Languages34~3032~20313
MCP serverYesYesYesYesYesYes
Dataflow + CFG + AST queryingYesAST onlyYes²
Hybrid search (BM25 + semantic)YesYesKeyword onlyYesYes
Git-aware (diff impact, co-change, branch diff)All 3All 3All 3
Dead code / role classificationYesYesYesYes
Incremental rebuildsO(changed)O(changed)O(n)O(n)³YesO(n)⁵
Architecture rules + CI gateYes
Security scanning (SAST / vuln detection)Intentionally out of scope⁶Yes
Zero config, npm installYes— (pip)YesYesYesYes
Graph export (GraphML / Neo4j / DOT)Yes
Open source + commercial useYes (Apache-2.0)Yes (MIT)Yes (MIT/Apache-2.0)Yes (MIT)Source-available⁷Non-commercial⁸

<sup>¹ colbymchenry/codegraph is an unrelated tool that shares the name. It focuses on reducing AI agent token consumption by pre-indexing code structure for fast context retrieval — not on structural analysis, CI gates, or complexity metrics. ² narsil-mcp added CFG and dataflow in recent versions. ³ colbymchenry/codegraph uses OS file watchers (chokidar) for auto-sync — rebuild triggers on file change but re-parses from scratch per file, not O(changed) hashing. ⁴ axon caches file-level parse results; the rebuild strategy is consistent with file-level incremental behaviour but has not been independently benchmarked for O(changed) complexity. ⁵ GitNexus skips re-index if the git commit hasn't changed, but re-processes the entire repo when it does — no per-file incremental parsing. ⁶ Codegraph focuses on structural understanding, not vulnerability detection — use dedicated SAST tools (Semgrep, CodeQL, Snyk) for that. ⁷ axon claims MIT in pyproject.toml but has no LICENSE file in the repo. ⁸ GitNexus uses the PolyForm Noncommercial 1.0.0 license.</sup>

What makes codegraph different

DifferentiatorIn practice
🤖AI-first architecture34-tool MCP server — agents query the graph directly instead of scraping the filesystem. One call replaces 20+ grep/find/cat invocations
🏷️Role classificationEvery symbol auto-tagged as entry/core/utility/adapter/dead/leaf — agents understand a symbol's architectural role without reading surrounding code
🔬Function-level, not just filesTraces handleAuth()validateToken()decryptJWT() and shows 14 callers across 9 files break if decryptJWT changes
Always-fresh graphThree-tier change detection: journal (O(changed)) → mtime+size (O(n) stats) → hash (O(changed) reads). Sub-second rebuilds — agents work with current data
💥Git diff impactcodegraph diff-impact shows changed functions, their callers, and full blast radius — enriched with historically coupled files from git co-change analysis. Ships with a GitHub Actions workflow
🌐Multi-language, one graph34 languages in a single graph — JS/TS, Python, Go, Rust, Java, C#, PHP, Ruby, C/C++, Kotlin, Swift, Scala, Bash, HCL, Elixir, Lua, Dart, Zig, Haskell, OCaml, F#, Gleam, Clojure, Julia, R, Erlang, Solidity, Objective-C, CUDA, Groovy, Verilog — agents don't need per-language tools
🧠Hybrid searchBM25 keyword + semantic embeddings fused via RRF — hybrid (default), semantic, or keyword mode; multi-query via "auth; token; JWT"
🔬Dataflow + CFGTrack how data flows through and between functions — function-level edges (flows_to, returns, mutates), interprocedural variable-level edges (arg_in, return_out, def_use), and intraprocedural control flow graphs — all 34 languages
🔓Fully local, zero costNo API keys, no accounts, no network calls. Optionally bring your own LLM provider — your code only goes where you choose

🚀 Quick Start

bash
npm install -g @optave/codegraph cd your-project codegraph build # → .codegraph/graph.db created

That's it. The graph is ready. Now connect your AI agent.

For AI agents (primary use case)

Connect directly via MCP — your agent gets 34 tools to query the graph:

bash
codegraph mcp # 34-tool MCP server — AI queries the graph directly

Or add codegraph to your agent's instructions (e.g. CLAUDE.md):

markdown
Before modifying code, always: 1. `codegraph where <name>` — find where the symbol lives 2. `codegraph context <name> -T` — get full context (source, deps, callers) 3. `codegraph fn-impact <name> -T` — check blast radius before editing After modifying code: 4. `codegraph diff-impact --staged -T` — verify impact before committing

Full agent setup: AI Agent Guide · CLAUDE.md template

For developers

The same graph is available via CLI:

bash
codegraph map # see most-connected files codegraph query myFunc # find any function, see callers & callees codegraph deps src/index.ts # file-level import/export map

Or install from source:

bash
git clone https://github.com/optave/ops-codegraph-tool.git cd codegraph && npm install && npm link

Dev builds: Pre-release tarballs are attached to GitHub Releases. Install with npm install -g <path-to-tarball>. Note that npm install -g <tarball-url> does not work because npm cannot resolve optional platform-specific dependencies from a URL — download the .tgz first, then install from the local file.


✨ Features

FeatureDescription
🤖MCP server34-tool MCP server for AI assistants; single-repo by default, opt-in multi-repo
🎯Deep contextcontext gives agents source, deps, callers, signature, and tests for a function in one call; audit --quick gives structural summaries
🏷️Node role classificationEvery symbol auto-tagged as entry/core/utility/adapter/dead/leaf based on connectivity — agents instantly know architectural role
📦Batch queryingAccept a list of targets and return all results in one JSON payload — enables multi-agent parallel dispatch
💥Impact analysisTrace every file affected by a change (transitive)
🧬Function-level tracingCall chains, caller trees, function-level impact, and A→B pathfinding with qualified call resolution
📍Fast lookupwhere shows exactly where a symbol is defined and used — minimal, fast
🔍Symbol searchFind any function, class, or method by name — exact match priority, relevance scoring, --file and --kind filters
📁File dependenciesSee what a file imports and what imports it
📊Diff impactParse git diff, find overlapping functions, trace their callers
🔗Co-change analysisAnalyze git history for files that always change together — surfaces hidden coupling the static graph can't see; enriches diff-impact with historically coupled files
🗺️Module mapBird's-eye view of your most-connected files
🏗️Structure & hotspotsDirectory cohesion scores, fan-in/fan-out hotspot detection, module boundaries
🔄Cycle detectionFind circular dependencies at file or function level
📤ExportDOT, Mermaid, JSON, GraphML, GraphSON, and Neo4j CSV graph export
🧠Semantic searchEmbeddings-powered natural language search with multi-query RRF ranking
👀Watch modeIncrementally update the graph as files change
Always freshThree-tier incremental detection — sub-second rebuilds even on large codebases
🔬Data flow analysisIntraprocedural parameter tracking, return consumers, argument flows, and mutation detection — all 34 languages
🧮Complexity metricsCognitive, cyclomatic, nesting depth, Halstead, and Maintainability Index per function
🏘️Community detectionLeiden clustering to discover natural module boundaries and architectural drift
📜Manifesto rule engineConfigurable pass/fail rules with warn/fail thresholds for CI gates via check (exit code 1 on fail)
👥CODEOWNERS integrationMap graph nodes to CODEOWNERS entries — see who owns each function, ownership boundaries in diff-impact
💾Graph snapshotssnapshot save/restore for instant DB backup and rollback — checkpoint before refactoring, restore without rebuilding
🔎Hybrid BM25 + semantic searchFTS5 keyword search + embedding-based semantic search fused via Reciprocal Rank Fusion — hybrid, semantic, or keyword modes
📄Pagination & NDJSON streamingUniversal --limit/--offset pagination on all MCP tools and CLI commands; --ndjson for newline-delimited JSON streaming
🔀Branch structural diffCompare code structure between two git refs — added/removed/changed symbols with transitive caller impact
🛡️Architecture boundariesUser-defined dependency rules between modules with onion architecture preset — violations flagged in manifesto and CI
CI validation predicatescheck command with configurable gates: complexity, blast radius, cycles, boundary violations — exit code 0/1 for CI
📋Composite auditSingle audit command combining explain + impact + health metrics per function — one call instead of 3-4
🚦Triage queuetriage merges connectivity, hotspots, roles, and complexity into a ranked audit priority queue
🔬Dataflow analysisTrack how data moves through and between functions — function-level (flows_to, returns, mutates) and interprocedural variable-level edges (arg_in, return_out, def_use) — all 34 languages, included by default, skip with --no-dataflow
🧩Control flow graphIntraprocedural CFG construction for all 34 languages — cfg command with text/DOT/Mermaid output, included by default, skip with --no-cfg
🔎AST node queryingStored queryable AST nodes (calls, new, string, regex, throw, await) — ast command with SQL GLOB pattern matching
🧬Expanded node/edge typesparameter, property, constant node kinds with parent_id for sub-declaration queries; contains, parameter_of, receiver edge kinds
📊Exports analysisexports <file> shows all exported symbols with per-symbol consumers, re-export detection, and counts
📈Interactive viewercodegraph plot generates an interactive HTML graph viewer with hierarchical/force/radial layouts, complexity overlays, and drill-down
🏷️Stable JSON schemanormalizeSymbol utility ensures consistent 7-field output (name, kind, file, line, endLine, role, fileHash) across all commands

See docs/examples for real-world CLI and MCP usage examples.

📦 Commands

Build & Watch

bash
codegraph build [dir] # Parse and build the dependency graph codegraph build --no-incremental # Force full rebuild codegraph build --dataflow # Extract data flow edges (flows_to, returns, mutates) codegraph build --engine wasm # Force WASM engine (skip native) codegraph watch [dir] # Watch for changes, update graph incrementally

Query & Explore

bash
codegraph query <name> # Find a symbol — shows callers and callees codegraph deps <file> # File imports/exports codegraph map # Top 20 most-connected files codegraph map -n 50 --no-tests # Top 50, excluding test files codegraph where <name> # Where is a symbol defined and used? codegraph where --file src/db.js # List symbols, imports, exports for a file codegraph stats # Graph health: nodes, edges, languages, quality score codegraph roles # Node role classification (entry, core, utility, adapter, dead, leaf) codegraph roles --role dead -T # Find dead code (unreferenced, non-exported symbols) codegraph roles --dynamic # Show dynamic call sink edges (eval, computed-key, unresolved) codegraph roles --role core --file src/ # Core symbols in src/ codegraph exports src/queries.js # Per-symbol consumer analysis (who calls each export) codegraph children <name> # List parameters, properties, constants of a symbol

Deep Context (designed for AI agents)

bash
codegraph context <name> # Full context: source, deps, callers, signature, tests codegraph context <name> --depth 2 --no-tests # Include callee source 2 levels deep codegraph brief <file> # Token-efficient file summary: symbols, roles, risk tiers codegraph audit <file> --quick # Structural summary: public API, internals, data flow codegraph audit <function> --quick # Function summary: signature, calls, callers, tests

Impact Analysis

bash
codegraph impact <file> # Transitive reverse dependency trace codegraph query <name> # Function-level: callers, callees, call chain codegraph query <name> --no-tests --depth 5 codegraph fn-impact <name> # What functions break if this one changes codegraph path <from> <to> # Shortest path between two symbols (A calls...calls B) codegraph path <from> <to> --reverse # Follow edges backward codegraph path <from> <to> --depth 5 --kinds calls,imports codegraph diff-impact # Impact of unstaged git changes codegraph diff-impact --staged # Impact of staged changes codegraph diff-impact HEAD~3 # Impact vs a specific ref codegraph diff-impact main --format mermaid -T # Mermaid flowchart of blast radius codegraph branch-compare main feature-branch # Structural diff between two refs codegraph branch-compare main HEAD --no-tests # Symbols added/removed/changed vs main codegraph branch-compare v2.4.0 v2.5.0 --json # JSON output for programmatic use codegraph branch-compare main HEAD --format mermaid # Mermaid diagram of structural changes

Co-Change Analysis

Analyze git history to find files that always change together — surfaces hidden coupling the static graph can't see. Requires a git repository.

bash
codegraph co-change --analyze # Scan git history and populate co-change data codegraph co-change src/queries.js # Show co-change partners for a file codegraph co-change # Show top co-changing file pairs globally codegraph co-change --since 6m # Limit to last 6 months of history codegraph co-change --min-jaccard 0.5 # Only show strong coupling (Jaccard >= 0.5) codegraph co-change --min-support 5 # Minimum co-commit count codegraph co-change --full # Include all details

Co-change data also enriches diff-impact — historically coupled files appear in a historicallyCoupled section alongside the static dependency analysis.

Structure & Hotspots

bash
codegraph structure # Directory overview with cohesion scores codegraph triage --level file # Files with extreme fan-in, fan-out, or density codegraph triage --level directory --sort coupling --no-tests

Code Health & Architecture

bash
codegraph complexity # Per-function cognitive, cyclomatic, nesting, MI codegraph complexity --health -T # Full Halstead health view (volume, effort, bugs, MI) codegraph complexity --sort mi -T # Sort by worst maintainability index codegraph complexity --above-threshold -T # Only functions exceeding warn thresholds codegraph communities # Leiden community detection — natural module boundaries codegraph communities --drift -T # Drift analysis only — split/merge candidates codegraph communities --functions # Function-level community detection codegraph check # Pass/fail rule engine (exit code 1 on fail) codegraph check -T # Exclude test files from rule evaluation

Dataflow, CFG & AST

bash
codegraph dataflow <name> # Data flow edges for a function (flows_to, returns, mutates) codegraph dataflow <name> --impact # Transitive data-dependent blast radius codegraph cfg <name> # Control flow graph (text format) codegraph cfg <name> --format dot # CFG as Graphviz DOT codegraph cfg <name> --format mermaid # CFG as Mermaid diagram codegraph ast # List all stored AST nodes codegraph ast "handleAuth" # Search AST nodes by pattern (GLOB) codegraph ast -k call # Filter by kind: call, new, string, regex, throw, await codegraph ast -k throw --file src/ # Combine kind and file filters

Note: Dataflow and CFG are included by default for all 34 languages. Use --no-dataflow / --no-cfg for faster builds.

Audit, Triage & Batch

Composite commands for risk-driven workflows and multi-agent dispatch.

bash
codegraph audit <file-or-function> # Combined structural summary + impact + health in one report codegraph audit <target> --quick # Structural summary only (skip impact and health) codegraph audit src/queries.js -T # Audit all functions in a file codegraph explain <target> # Alias for audit — same output, easier to discover codegraph triage # Ranked audit priority queue (connectivity + hotspots + roles) codegraph triage -T --limit 20 # Top 20 riskiest functions, excluding tests codegraph triage --level file -T # File-level hotspot analysis codegraph triage --level directory -T # Directory-level hotspot analysis codegraph batch target1 target2 ... # Batch query multiple targets in one call codegraph batch --json targets.json # Batch from a JSON file

CI Validation

codegraph check provides configurable pass/fail predicates for CI gates and state machines. Exit code 0 = pass, 1 = fail.

bash
codegraph check # Run manifesto rules on whole codebase codegraph check --staged # Check staged changes (diff predicates) codegraph check --staged --rules # Run both diff predicates AND manifesto rules codegraph check --no-new-cycles # Fail if staged changes introduce cycles codegraph check --max-complexity 30 # Fail if any function exceeds complexity threshold codegraph check --max-blast-radius 50 # Fail if blast radius exceeds limit codegraph check --no-boundary-violations # Fail on architecture boundary violations codegraph check main # Check current branch vs main

CODEOWNERS

Map graph symbols to CODEOWNERS entries. Shows who owns each function and surfaces ownership boundaries.

bash
codegraph owners # Show ownership for all symbols codegraph owners src/queries.js # Ownership for symbols in a specific file codegraph owners --boundary # Show ownership boundaries between modules codegraph owners --owner @backend # Filter by owner

Ownership data also enriches diff-impact — affected owners and suggested reviewers appear alongside the static dependency analysis.

Snapshots

Lightweight SQLite DB backup and restore — checkpoint before refactoring, instantly rollback without rebuilding.

bash
codegraph snapshot save before-refactor # Save a named snapshot codegraph snapshot list # List all snapshots codegraph snapshot restore before-refactor # Restore a snapshot codegraph snapshot delete before-refactor # Delete a snapshot

Export & Visualization

bash
codegraph export -f dot # Graphviz DOT format codegraph export -f mermaid # Mermaid diagram codegraph export -f json # JSON graph codegraph export -f graphml # GraphML (XML standard) codegraph export -f graphson # GraphSON (TinkerPop v3 / Gremlin) codegraph export -f neo4j # Neo4j CSV (bulk import, separate nodes/relationships files) codegraph export --functions -o graph.dot # Function-level, write to file codegraph plot # Interactive HTML viewer with force/hierarchical/radial layouts codegraph cycles # Detect circular dependencies codegraph cycles --functions # Function-level cycles

Local embeddings for every function, method, and class — search by natural language. Everything runs locally using @huggingface/transformers — no API keys needed.

bash
codegraph embed # Build embeddings (default: nomic) codegraph embed --model nomic-v1.5 # Use a different model codegraph search "handle authentication" codegraph search "parse config" --min-score 0.4 -n 10 codegraph search "parseConfig" --mode keyword # BM25 keyword-only (exact names) codegraph search "auth flow" --mode semantic # Embedding-only (conceptual) codegraph search "auth flow" --mode hybrid # BM25 + semantic RRF fusion (default) codegraph models # List available models

Separate queries with ; to search from multiple angles at once. Results are ranked using Reciprocal Rank Fusion (RRF) — items that rank highly across multiple queries rise to the top.

bash
codegraph search "auth middleware; JWT validation" codegraph search "parse config; read settings; load env" -n 20 codegraph search "error handling; retry logic" --kind function codegraph search "database connection; query builder" --rrf-k 30

A single trailing semicolon is ignored (falls back to single-query mode). The --rrf-k flag controls the RRF smoothing constant (default 60) — lower values give more weight to top-ranked results.

Available Models

Per-model retrieval quality (Hit@N) and timing are measured on every release — see EMBEDDING-BENCHMARKS.md.

FlagModelDimensionsSizeLicenseNotes
minilmall-MiniLM-L6-v2384~23 MBApache-2.0Fastest, good for quick iteration
jina-smalljina-embeddings-v2-small-en512~33 MBApache-2.0Better quality, still small
jina-basejina-embeddings-v2-base-en768~137 MBApache-2.0High quality, 8192 token context
jina-codejina-embeddings-v2-base-code768~137 MBApache-2.0Best for code search, trained on code+text
nomic (default)nomic-embed-text-v1768~137 MBApache-2.0Good quality, 8192 context
nomic-v1.5nomic-embed-text-v1.5768~137 MBApache-2.0Matryoshka MRL training (unused — codegraph stores full 768d); v1 scores higher on our benchmark
bge-largebge-large-en-v1.51024~335 MBMITBest general retrieval, top MTEB scores
mxbai-xsmallmxbai-embed-xsmall-v1384~50 MBApache-2.0Tiny + long context (4096)
mxbai-largemxbai-embed-large-v11024~400 MBApache-2.0Top MTEB BERT-large
bge-m3bge-m31024~600 MBMITMultilingual (100+ languages), 8192 context
modernbertmodernbert-embed-base768~150 MBApache-2.0ModernBERT architecture, 8192 ctx, English

The model used during embed is stored in the database, so search auto-detects it — no need to pass --model when searching.

Multi-Repo Registry

Manage a global registry of codegraph-enabled projects. The registry stores paths to your built graphs so the MCP server can query them when multi-repo mode is enabled.

bash
codegraph registry list # List all registered repos codegraph registry list --json # JSON output codegraph registry add <dir> # Register a project directory codegraph registry add <dir> -n my-name # Custom name codegraph registry remove <name> # Unregister

codegraph build auto-registers the project — no manual setup needed.

Configuration

Inspect and manage .codegraphrc.json settings.

bash
codegraph config # Show all config keys with values and sources codegraph config --json # JSON output of the merged config codegraph config --init # Scaffold a .codegraphrc.json with all sections pre-populated codegraph config --edit # Open .codegraphrc.json in $EDITOR codegraph config --enable-global # Opt this repo into user-level global config codegraph config --disable-global # Opt this repo out of user-level global config codegraph config --list-global # Show the contents of the global config file

A user-level config file at ~/.config/codegraph/config.json (XDG) or ~/.codegraph/config.json lets you set personal defaults once and apply them to opted-in repos. The merge order is: DEFAULTS → global (if consented) → project → env. Non-interactive contexts (CI, MCP) never apply the global config without explicit consent. See docs/guides/configuration.md for full details.

Common Flags

FlagDescription
-d, --db <path>Custom path to graph.db
-T, --no-testsExclude .test., .spec., __test__ files (available on most query commands including query, fn-impact, path, context, where, diff-impact, search, map, roles, co-change, deps, impact, complexity, communities, branch-compare, audit, triage, check, dataflow, cfg, ast, exports, children)
--depth <n>Transitive trace depth (default varies by command)
-j, --jsonOutput as JSON
-v, --verboseEnable debug output
--engine <engine>Parser engine: native, wasm, or auto (default: auto)
-k, --kind <kind>Filter by kind: function, method, class, interface, type, struct, enum, trait, record, module, parameter, property, constant
-f, --file <path>Scope to a specific file (fn, context, where)
--mode <mode>Search mode: hybrid (default), semantic, or keyword (search)
--ndjsonOutput as newline-delimited JSON (one object per line)
--tableOutput as auto-column aligned table
--csvOutput as CSV (RFC 4180, nested objects flattened)
--limit <n>Limit number of results
--offset <n>Skip first N results (pagination)
--rrf-k <n>RRF smoothing constant for multi-query search (default 60)
--user-config [path]Apply global user config for this run; optionally specify a custom path instead of the XDG default (~/.config/codegraph/config.json)
--no-user-configSkip global user config for this run (CI/non-interactive safe)

🌐 Language Support

LanguageExtensionsImportsExportsCall SitesHeritage¹Type Inference²Dataflow
JavaScript.js, .jsx, .mjs, .cjs
TypeScript.ts, .tsx
Python.py, .pyi
Go.go
Rust.rs
Java.java
C#.cs
PHP.php, .phtml
Ruby.rb, .rake, .gemspec—³
C.c, .h—⁴—⁴
C++.cpp, .hpp, .cc, .cxx
Kotlin.kt, .kts
Swift.swift
Scala.scala, .sc
Bash.sh, .bash—⁴—⁴
Elixir.ex, .exs
Lua.lua
Dart.dart
Zig.zig
Haskell.hs
OCaml.ml, .mli
F#.fs, .fsx, .fsi
Gleam.gleam
Clojure.clj, .cljs, .cljc
Julia.jl
R.r, .R
Erlang.erl, .hrl
Solidity.sol
Objective-C.m
CUDA.cu, .cuh
Groovy.groovy, .gvy
Verilog.v, .sv
Terraform.tf, .hcl—³—³—³—³—³

¹ Heritage = extends, implements, include/extend (Ruby), trait impl (Rust), receiver methods (Go).
² Type Inference extracts a per-file type map from annotations (const x: Router, MyType x, x: MyType) and new expressions, enabling the edge resolver to connect x.method()Type.method().
³ Not applicable — Ruby is dynamically typed; Terraform/HCL is declarative (no functions, classes, or type system).
⁴ Not applicable — C and Bash have no class/inheritance system.
All languages have full parity between the native Rust engine and the WASM fallback.

⚙️ How It Works

┌──────────┐    ┌───────────┐    ┌───────────┐    ┌──────────┐    ┌─────────┐
│  Source  │──▶│ tree-sitter│──▶│  Extract  │──▶│  Resolve │──▶│ SQLite  │
│  Files   │    │   Parse   │    │  Symbols  │    │  Imports │    │   DB    │
└──────────┘    └───────────┘    └───────────┘    └──────────┘    └─────────┘
                                                                       │
                                                                       ▼
                                                                 ┌─────────┐
                                                                 │  Query  │
                                                                 └─────────┘
  1. Parse — tree-sitter parses every source file into an AST (native Rust engine or WASM fallback)
  2. Extract — Functions, classes, methods, interfaces, imports, exports, call sites, parameters, properties, and constants are extracted
  3. Resolve — Imports are resolved to actual files (handles ESM conventions, tsconfig.json path aliases, baseUrl)
  4. Store — Everything goes into SQLite as nodes + edges with tree-sitter node boundaries, plus structural edges (contains, parameter_of, receiver)
  5. Analyze (opt-in) — Complexity metrics, control flow graphs (--cfg), dataflow edges (--dataflow), and AST node storage
  6. Query — All queries run locally against the SQLite DB — typically under 100ms

Incremental Rebuilds

The graph stays current without re-parsing your entire codebase. Three-tier change detection ensures rebuilds are proportional to what changed, not the size of the project:

  1. Tier 0 — Journal (O(changed)): If codegraph watch was running, a change journal records exactly which files were touched. The next build reads the journal and only processes those files — zero filesystem scanning
  2. Tier 1 — mtime+size (O(n) stats, O(changed) reads): No journal? Codegraph stats every file and compares mtime + size against stored values. Matching files are skipped without reading a single byte
  3. Tier 2 — Hash (O(changed) reads): Files that fail the mtime/size check are read and MD5-hashed. Only files whose hash actually changed get re-parsed and re-inserted

Result: change one file in a 3,000-file project and the rebuild completes in under a second. Put it in a commit hook, a file watcher, or let your AI agent trigger it.

What incremental rebuilds refresh — and what they don't

Incremental builds re-parse changed files and rebuild their edges, structure metrics, and role classifications. But some data is only fully refreshed on a full rebuild:

DataIncrementalFull rebuild
Symbols & edges for changed filesYesYes
Reverse-dependency cascade (importers of changed files)YesYes
AST nodes, complexity, CFG, dataflow for changed filesYesYes
Directory-level cohesion metricsPartial (skipped for ≤5 files)Yes
Advisory checks (orphaned embeddings, stale embeddings, unused exports)SkippedYes
Build metadata persistenceSkipped for ≤3 filesYes
Incremental drift detectionSkippedYes

When to run a full rebuild:

bash
codegraph build --no-incremental # Force full rebuild
  • After large refactors (renames, moves, deleted files) — the reverse-dependency cascade handles most cases, but a full rebuild ensures nothing is stale
  • If you suspect stale analysis data — complexity or dataflow results for files you didn't directly edit won't update incrementally
  • Periodically — if you rely heavily on complexity, dataflow, roles --role dead, or communities queries, run a full rebuild weekly or after major merges
  • After upgrading codegraph — engine, schema, or version changes trigger an automatic full rebuild, but if you skip versions you may want to force one

Codegraph auto-detects and forces a full rebuild when the engine, schema version, or codegraph version changes between builds. For everything else, incremental is the safe default — a full rebuild is a correctness guarantee, not a frequent necessity.

Detailed guide: See docs/guides/incremental-builds.md for a complete breakdown of what each build mode refreshes and recommended rebuild schedules.

Dual Engine

Codegraph ships with two parsing engines:

EngineHow it worksWhen it's used
Native (Rust)napi-rs addon built from crates/codegraph-core/ — parallel multi-core parsing via rayonAuto-selected when the prebuilt binary is available
WASMweb-tree-sitter with pre-built .wasm grammars in grammars/Fallback when the native addon isn't installed

Both engines produce identical output. Use --engine native|wasm|auto to control selection (default: auto).

On the native path, Rust handles the entire hot pipeline end-to-end:

PhaseWhat Rust does
ParseParallel multi-file tree-sitter parsing via rayon (3.5× faster than WASM)
ExtractSymbols, imports, calls, classes, type maps, AST nodes — all in one pass
AnalyzeComplexity (cognitive, cyclomatic, Halstead), CFG, and dataflow pre-computed per function during parse
ResolveImport resolution with 6-level priority system and confidence scoring
EdgesCall, receiver, extends, and implements edge inference
DB writesAll inserts (nodes, edges, AST nodes, complexity, CFG, dataflow) via rusqlite — better-sqlite3 is lazy-loaded only for the WASM fallback path

The Rust crate (crates/codegraph-core/) exposes a NativeDatabase napi-rs class that holds a persistent rusqlite::Connection for the full build lifecycle, eliminating JS↔SQLite round-trips on every operation.

Call Resolution

Calls are resolved with qualified resolution — method calls (obj.method()) are distinguished from standalone function calls, and built-in receivers (console, Math, JSON, Array, Promise, etc.) are filtered out automatically. Import scope is respected: a call to foo() only resolves to functions that are actually imported or defined in the same file, eliminating false positives from name collisions.

PrioritySourceConfidence
1Import-awareimport { foo } from './bar' → link to bar1.0
2Same-file — definitions in the current file1.0
3Same directory — definitions in sibling files (standalone calls only)0.7
4Same parent directory — definitions in sibling dirs (standalone calls only)0.5
5Method hierarchy — resolved through extends/implementsvaries

Method calls on unknown receivers skip global fallback entirely — stmt.run() will never resolve to a standalone run function in another file. Duplicate caller/callee edges are deduplicated automatically. Dynamic patterns like fn.call(), fn.apply(), fn.bind(), and obj["method"]() are also detected on a best-effort basis.

Codegraph also extracts symbols from common callback patterns: Commander .command().action() callbacks (as command:build), Express route handlers (as route:GET /api/users), and event emitter listeners (as event:data).

📊 Performance

Self-measured on every release via CI (build benchmarks | embedding benchmarks | query benchmarks | incremental benchmarks | resolution precision/recall):

Last updated: v3.15.0 (2026-06-23)

MetricNativeWASM
Build speed5.8 ms/file26.2 ms/file
Query time33ms47ms
No-op rebuild24ms24ms
1-file rebuild117ms112ms
Query: fn-deps2.4ms2.2ms
Query: path2.4ms2.1ms
~50,000 files (est.)~290.0s build~1310.0s build
Resolution precision93.6%
Resolution recall64.7%

Metrics are normalized per file for cross-version comparability. Times above are for a full initial build — incremental rebuilds only re-parse changed files.

<details><summary>Per-language resolution precision/recall</summary>
LanguagePrecisionRecallTPFPFNEdgesDynamic
javascript100.0%100.0%42004214/32
typescript95.9%100.0%472047
bash100.0%100.0%1200120/1
c100.0%100.0%9009
clojure80.0%26.7%411115
cpp100.0%57.1%80614
csharp100.0%100.0%230023
cuda50.0%33.3%44812
dart0.0%0.0%001818
dynamic-groovy100.0%100.0%1001
dynamic-java100.0%100.0%1001
dynamic-javascript100.0%100.0%4004
dynamic-kotlin100.0%100.0%3003
dynamic-scala100.0%100.0%1001
dynamic-typescript100.0%100.0%3003
elixir100.0%81.0%170421
erlang100.0%100.0%120012
fsharp0.0%0.0%0111212
gleam100.0%26.7%401115
go100.0%69.2%9041313/14
groovy100.0%7.7%101213
haskell100.0%33.3%40812
hcl0.0%0.0%0022
java100.0%80.0%160420
julia100.0%73.3%110415
kotlin92.3%63.2%121719
lua100.0%15.4%201113
objc100.0%46.2%60713
ocaml100.0%8.3%101112
php100.0%57.9%110819
pts-javascript100.0%100.0%130013
python100.0%60.0%9061515/15
r100.0%100.0%110011
ruby100.0%100.0%11001111/11
rust100.0%64.3%90514
scala100.0%100.0%7007
solidity33.3%7.7%121213
swift81.8%64.3%925149/9
tsx100.0%100.0%130013
verilog0.0%0.0%0044
zig66.7%13.3%211315

By resolution mode (all languages):

ModeResolvedExpectedRecall
receiver-typed3511231.3%
module-function4611241.1%
static829784.5%
same-file689075.6%
interface-dispatched2222100.0%
dynamic1616100.0%
class-inheritance81266.7%
callback77100.0%
pts-spread44100.0%
pts-define-property33100.0%
pts-create-prototype22100.0%
points-to22100.0%
re-export22100.0%
pts-for-of22100.0%
pts-set22100.0%
pts-array-from22100.0%
trait-dispatch020.0%
define-property11100.0%
defineProperty-accessor11100.0%
package-function11100.0%
</details>

Lightweight Footprint

<a href="https://www.npmjs.com/package/@optave/codegraph"><img src="https://img.shields.io/npm/unpacked-size/@optave/codegraph?style=flat-square&label=unpacked%20size" alt="npm unpacked size" /></a>

Only 3 runtime dependencies — everything else is optional or a devDependency:

DependencyWhat it does
better-sqlite3SQLite driver (WASM engine; lazy-loaded, not used for native-engine reads)GitHub starsnpm downloads
commanderCLI argument parsingGitHub starsnpm downloads
web-tree-sitterWASM tree-sitter bindingsGitHub starsnpm downloads

Optional: @huggingface/transformers (semantic search), @modelcontextprotocol/sdk (MCP server) — lazy-loaded only when needed.

🤖 AI Agent Integration (Core)

MCP Server

Codegraph is built around a Model Context Protocol server with 34 tools (35 in multi-repo mode) — the primary way agents consume the graph:

bash
codegraph mcp # Single-repo mode (default) — only local project codegraph mcp --multi-repo # Enable access to all registered repos codegraph mcp --repos a,b # Restrict to specific repos (implies --multi-repo)

Single-repo mode (default): Tools operate only on the local .codegraph/graph.db. The repo parameter and list_repos tool are not exposed to the AI agent.

Multi-repo mode (--multi-repo): All tools gain an optional repo parameter to target any registered repository, and list_repos becomes available. Use --repos to restrict which repos the agent can access.

CLAUDE.md / Agent Instructions

Add this to your project's CLAUDE.md to help AI agents use codegraph. Full template with all commands in the AI Agent Guide.

markdown
## Codegraph This project uses codegraph for dependency analysis. The graph is at `.codegraph/graph.db`. ### Before modifying code: 1. `codegraph where <name>` — find where the symbol lives 2. `codegraph audit --quick <target>` — understand the structure 3. `codegraph context <name> -T` — get full context (source, deps, callers) 4. `codegraph fn-impact <name> -T` — check blast radius before editing ### After modifying code: 5. `codegraph diff-impact --staged -T` — verify impact before committing ### Other useful commands - `codegraph build .` — rebuild graph (incremental by default) - `codegraph map` — module overview · `codegraph stats` — graph health - `codegraph query <name> -T` — call chain · `codegraph path <from> <to> -T` — shortest path - `codegraph deps <file>` — file deps · `codegraph exports <file> -T` — export consumers - `codegraph audit <target> -T` — full risk report · `codegraph triage -T` — priority queue - `codegraph check --staged` — CI gate · `codegraph batch t1 t2 -T --json` — batch query - `codegraph search "<query>"` — semantic search · `codegraph cycles` — cycle detection - `codegraph roles --role dead -T` — dead code · `codegraph complexity -T` — metrics - `codegraph dataflow <name> -T` — data flow · `codegraph cfg <name> -T` — control flow ### Flags - `-T` — exclude test files (use by default) · `-j` — JSON output - `-f, --file <path>` — scope to file · `-k, --kind <kind>` — filter kind

See docs/guides/recommended-practices.md for integration guides:

  • Git hooks — auto-rebuild on commit, impact checks on push, commit message enrichment
  • CI/CD — PR impact comments, threshold gates, graph caching
  • AI agents — MCP server, CLAUDE.md templates, Claude Code hooks
  • Developer workflow — watch mode, explore-before-you-edit, semantic search
  • Secure credentialsapiKeyCommand with 1Password, Bitwarden, Vault, macOS Keychain, pass

For AI-specific integration, see the AI Agent Guide — a comprehensive reference covering the 6-step agent workflow, complete command-to-MCP mapping, Claude Code hooks, and token-saving patterns.

🔁 CI / GitHub Actions

Codegraph ships with a ready-to-use GitHub Actions workflow that comments impact analysis on every pull request.

Copy .github/workflows/codegraph-impact.yml to your repo, and every PR will get a comment like:

3 functions changed12 callers affected across 7 files

🛠️ Configuration

Create a .codegraphrc.json in your project root to customize behavior. The snippets below cover the most-used keys — see docs/guides/configuration.md for the full reference (every group, every key, every default).

Global (user-level) config: you can also define personal defaults once at ~/.config/codegraph/config.json and opt individual repos into it with codegraph config --enable-global. The global layer merges below the project config so repos always win, and non-interactive contexts (CI, MCP) never apply it without explicit consent. See docs/guides/configuration.md#user-level-global-configuration.

json
{ "include": ["src/**", "lib/**"], "exclude": ["**/*.test.js", "**/__mocks__/**"], "ignoreDirs": ["node_modules", ".git", "dist"], "ignoreAdditionalDirs": ["crates", "vendor"], "extensions": [".js", ".ts", ".tsx", ".py"], "aliases": { "@/": "./src/", "@utils/": "./src/utils/" }, "build": { "incremental": true }, "query": { "excludeTests": true } }

Tip: excludeTests can also be set at the top level as a shorthand — { "excludeTests": true } is equivalent to nesting it under query. If both are present, the nested query.excludeTests takes precedence.

Manifesto rules

Configure pass/fail thresholds for codegraph check (manifesto mode):

json
{ "manifesto": { "rules": { "cognitive_complexity": { "warn": 15, "fail": 30 }, "cyclomatic_complexity": { "warn": 10, "fail": 20 }, "nesting_depth": { "warn": 4, "fail": 6 }, "maintainability_index": { "warn": 40, "fail": 20 }, "halstead_bugs": { "warn": 0.5, "fail": 1.0 } } } }

When any function exceeds a fail threshold, codegraph check exits with code 1 — perfect for CI gates.

LLM credentials

Codegraph supports an apiKeyCommand field for secure credential management. Instead of storing API keys in config files or environment variables, you can shell out to a secret manager at runtime:

json
{ "llm": { "provider": "openai", "apiKeyCommand": "op read op://vault/openai/api-key" } }

The command is split on whitespace and executed with execFileSync (no shell injection risk). Priority: command output > CODEGRAPH_LLM_API_KEY env var > file config. On failure, codegraph warns and falls back to the next source.

Works with any secret manager: 1Password CLI (op), Bitwarden (bw), pass, HashiCorp Vault, macOS Keychain (security), AWS Secrets Manager, etc.

MCP tool filtering

Codegraph's MCP server exposes 34 tools by default. For models with a small context window, you can shrink the schema by disabling tools you don't use:

json
{ "mcp": { "disabledTools": ["execution_flow", "sequence", "communities", "co_changes"] } }

Names are matched case-insensitively and a leading codegraph<digits>_ prefix (e.g. codegraph2_module_map) is stripped before comparison. Disabled tools are removed from tools/list and any tools/call invocation returns Unknown tool: <name>. See docs/guides/configuration.md#mcp-tool-filtering for the full tool catalog, and the rest of that guide for every other config option.

📖 Programmatic API

Codegraph also exports a full API for use in your own tools:

js
import { buildGraph, queryNameData, findCycles, exportDOT, normalizeSymbol } from '@optave/codegraph'; // Build the graph buildGraph('/path/to/project'); // Query programmatically const results = queryNameData('myFunction', '/path/to/.codegraph/graph.db'); // All query results use normalizeSymbol for a stable 7-field schema
js
import { parseFileAuto, getActiveEngine, isNativeAvailable } from '@optave/codegraph'; // Check which engine is active console.log(getActiveEngine()); // 'native' or 'wasm' console.log(isNativeAvailable()); // true if Rust addon is installed // Parse a single file (uses auto-selected engine) const symbols = await parseFileAuto('/path/to/file.ts');
js
import { searchData, multiSearchData, buildEmbeddings } from '@optave/codegraph'; // Build embeddings (one-time) await buildEmbeddings('/path/to/project'); // Single-query search const { results } = await searchData('handle auth', dbPath); // Multi-query search with RRF ranking const { results: fused } = await multiSearchData( ['auth middleware', 'JWT validation'], dbPath, { limit: 10, minScore: 0.3 } ); // Each result has: { name, kind, file, line, rrf, queryScores[] }

⚠️ Limitations

  • TypeScript compiler integration is auto-enabled — when typescript is installed and a tsconfig.json is found, the TypeScript compiler API pass runs automatically; disable with "build": { "typescriptResolver": false } in .codegraphrc.json if you want faster builds without it; heuristic type inference (annotations, new expressions, assignment chains) is always active as a baseline
  • Dynamic calls are best-effort — complex computed property access and eval patterns are not resolved
  • Python imports — resolves relative imports but doesn't follow sys.path or virtual environment packages
  • Dataflow analysis — interprocedural edges (arg_in, return_out) require a full build after adding new callee files; incremental re-stitch fires automatically on both the JS and native engine paths

🗺️ Roadmap

See ROADMAP.md for the full development roadmap and STABILITY.md for the stability policy and versioning guarantees. Current plan:

  1. Rust CoreComplete (v1.3.0) — native tree-sitter parsing via napi-rs, parallel multi-core parsing, incremental re-parsing, import resolution & cycle detection in Rust
  2. Foundation HardeningComplete (v1.5.0) — parser registry, complete MCP, test coverage, enhanced config, multi-repo MCP
  3. Analysis ExpansionComplete (v2.7.0) — complexity metrics, community detection, flow tracing, co-change, manifesto, boundary rules, check, triage, audit, batch, hybrid search
  4. Deep Analysis & Graph EnrichmentComplete (v3.0.0) — dataflow analysis, intraprocedural CFG, AST node storage, expanded node/edge types, interactive viewer, exports command
  5. Architectural RefactoringComplete (v3.1.5) — unified AST analysis, composable MCP, domain errors, builder pipeline, graph model, qualified names, presentation layer, CLI composability
  6. Resolution AccuracyComplete (v3.3.1) — type inference, receiver type tracking, dead role sub-categories, resolution benchmarks, package.json exports, monorepo workspace resolution
  7. TypeScript MigrationComplete (v3.4.0) — all 271 source files migrated from JS to TS, zero .js remaining
  8. Native Analysis AccelerationComplete (v3.5.0) — all build phases in Rust/rusqlite, sub-100ms incremental rebuilds, better-sqlite3 lazy-loaded as fallback only
  9. Expanded Language SupportComplete (v3.8.0) — 23 new languages in 4 batches (11 → 34), dual-engine WASM + Rust support for all
  10. Analysis DepthComplete (v3.12.0) — TypeScript-native resolution, inter-procedural type propagation, field-based points-to analysis, barrel re-export chain resolution, CHA+RTA dynamic dispatch
  11. Runtime & Extensibility — event-driven pipeline, plugin system, query caching, pagination
  12. Quality, Security & Technical Debt — supply-chain security (SBOM, SLSA), CI coverage gates, timer cleanup, tech debt kill list
  13. Architectural Health — split god files (types.ts, dataflow.ts), missing ADRs, language quality tiers, per-edge confidence transparency
  14. Intelligent Embeddings — LLM-generated descriptions, enhanced embeddings, module summaries
  15. Natural Language Queriescodegraph ask command, conversational sessions
  16. GitHub Integration & CI — reusable GitHub Action, LLM-enhanced PR review, SARIF output
  17. Advanced Features — dead code detection, monorepo support, agentic search

🤝 Contributing

Contributions are welcome! See CONTRIBUTING.md for the full guide — setup, workflow, commit convention, testing, and architecture notes.

bash
git clone https://github.com/optave/ops-codegraph-tool.git cd codegraph npm install npm test

Looking to add a new language? Check out Adding a New Language.

📄 License

Apache-2.0


<p align="center"> <sub>Built with <a href="https://tree-sitter.github.io/">tree-sitter</a> and <a href="https://github.com/WiseLibs/better-sqlite3">better-sqlite3</a>. Your code stays on your machine.</sub> </p>

Contributors

Showing top 5 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from optave/ops-codegraph-tool via the GitHub API.Last fetched: 6/28/2026