Pi serini
A Minimalistic Search Agent
**pi serini** is A Minimalistic Search Agent The project is written primarily in TypeScript, distributed under the MIT License license, first published in 2026. Key topics include: agentic-search, deep-research, deep-research-agent.
pi-serini
A reusable, reproducible pi search-agent workspace
There are many search agents, but this one is
yours.
</div>pi-serini is a reusable, benchmark-driven pi search-agent workspace for index-driven BM25 retrieval, agentic search, and benchmark-aware evaluation.
Current release status: v0.3.0 supports index-driven benchmark and agentic search workflows for MS MARCO v1 Passage (dl19, dl20) and BrowseComp-Plus, with benchmark-template included as a tiny local end-to-end demo benchmark. This release tracks the Pi package namespace migration to @earendil-works/* and requires pi 0.74.0 or newer.
The repo is now manifest-driven rather than BrowseComp-Plus-only:
- benchmark defaults live in typed registry entries under
src/benchmarks/ - each run snapshots its resolved benchmark condition into
benchmark_manifest_snapshot.json - active Node.js/TypeScript control-plane entrypoints live under
src/orchestration/ - compatibility-only TypeScript entrypoints live under
src/legacy/ - shared runtime primitives live under
src/runtime/ - legacy shell scripts remain available as compatibility shims
BrowseComp-Plus remains the default benchmark for reproducibility, but the same control plane now also supports MS MARCO v1 Passage and a tiny local benchmark-template demo benchmark.
Supported benchmarks
browsecomp-plus— default packaged benchmark with query setsq9,q100,q300, andqfullmsmarco-v1-passage— index-driven MS MARCO v1 passage benchmark with query setsdl19anddl20benchmark-template— tiny local end-to-end demo benchmark for development and validation
To inspect the registered benchmark catalog from the CLI:
bashnpm run bench -- benchmarks
Requirements
pi0.74.0or newer installed and logged in- Node.js with
npx - Java 21+
python3uvcurlorwget
Supported developer environments:
- macOS
- Linux
If Java is installed in a non-standard location, set JAVA_HOME explicitly before running setup or benchmark commands.
Pi packages now live under the @earendil-works/* npm namespace. This repo depends on @earendil-works/pi-coding-agent and @earendil-works/pi-tui; use that namespace for any local extension or SDK imports rather than the retired @mariozechner/* package names.
Quickstart
1. Set up benchmark assets
BrowseComp-Plus base assets:
bashnpm run setup:browsecomp-plus
BrowseComp-Plus decrypted ground truth is a separate opt-in step and requires an explicit decryption secret from the operator:
bashBROWSECOMP_PLUS_CANARY='...your secret...' \ npm run setup:ground-truth:browsecomp-plus
MS MARCO v1 Passage:
bashnpm run setup:msmarco-v1-passage
Tiny local demo benchmark:
bashnpm run setup:benchmark -- --benchmark benchmark-template
2. Run a benchmark query set
Use the same generic command surface for every benchmark; only BENCHMARK and QUERY_SET change.
Default single-process launch:
bashBENCHMARK=msmarco-v1-passage \ QUERY_SET=dl19 \ MODEL=openai-codex/gpt-5.4-mini \ npm run run:benchmark:query-set
Shared BM25 daemon (preferred package alias):
bashBENCHMARK=browsecomp-plus \ QUERY_SET=q9 \ MODEL=openai-codex/gpt-5.4-mini \ PI_BM25_RPC_PORT=50455 \ npm run run:benchmark:query-set:shared-bm25
Sharded shared-daemon launch (preferred package alias):
bashBENCHMARK=browsecomp-plus \ QUERY_SET=q100 \ SHARD_COUNT=4 \ MODEL=openai-codex/gpt-5.4-mini \ npm run run:benchmark:query-set:sharded-shared-bm25
Tiny local demo run:
bashBENCHMARK=benchmark-template \ QUERY_SET=test \ MODEL=openai-codex/gpt-5.4-mini \ npm run run:benchmark:query-set
BM25 tuning during benchmark runs
Benchmark runs accept BM25 tuning through environment variables:
PI_BM25_K1— default0.9PI_BM25_B— default0.4PI_BM25_THREADS— default1
Example with explicit BM25 tuning:
bashPI_BM25_K1=0.82 \ PI_BM25_B=0.68 \ BENCHMARK=msmarco-v1-passage \ QUERY_SET=dl19 \ MODEL=openai-codex/gpt-5.4-mini \ npm run run:benchmark:query-set
Example with shared BM25 daemon tuning:
bashPI_BM25_K1=0.82 \ PI_BM25_B=0.68 \ PI_BM25_THREADS=4 \ BENCHMARK=browsecomp-plus \ QUERY_SET=q9 \ MODEL=openai-codex/gpt-5.4-mini \ npm run run:benchmark:query-set:shared-bm25
Suggested BrowseComp-Plus parameters:
PI_BM25_K1=25PI_BM25_B=1
Example:
bashPI_BM25_K1=25 \ PI_BM25_B=1 \ BENCHMARK=browsecomp-plus \ QUERY_SET=q9 \ MODEL=openai-codex/gpt-5.4-mini \ npm run run:benchmark:query-set:shared-bm25
For systematic BM25 parameter search rather than manual overrides, use:
bashnpm run tune:bm25
3. Summarize and evaluate a run
Summarize:
bashRUN_DIR=runs/<run> npm run summarize:run
Retrieval evaluation:
bashRUN_DIR=runs/<run> npm run evaluate:retrieval
Judge evaluation:
bashINPUT_DIR=runs/<run> npm run evaluate:run
Generate a Markdown report:
bashRUN_DIR=runs/<run> npm run report:run
benchctl operator workflow
Use the direct run:benchmark:* entrypoints when you want low-level benchmark execution with explicit benchmark and query-set control.
Use benchctl when you want the higher-level operator surface for:
- listing registered benchmarks and managed presets
- launching supervisor-managed runs
- checking run status and managed process state
- monitoring runs in the live terminal dashboard
Common commands:
List registered benchmarks and presets:
bashnpm run bench -- benchmarks
Launch a managed shared run:
bashnpm run bench -- run --preset q9_shared --model openai-codex/gpt-5.4-mini
Launch a managed sharded run:
bashnpm run bench -- run --preset browsecomp-plus/qfull_sharded --model openai-codex/gpt-5.4-mini --shards 8
Inspect current run status:
bashnpm run bench:status npm run bench:managed
Open the live operator TUI:
bashnpm run bench:tui
For the full managed-run and monitoring workflow, see Running benchmarks.
Preferred entrypoints
Preferred operator-facing commands are the Node-first package scripts:
npm run setup:benchmarknpm run run:benchmark:query-setnpm run run:benchmark:query-set:shared-bm25npm run run:benchmark:query-set:sharded-shared-bm25npm run summarize:runnpm run evaluate:retrievalnpm run evaluate:runnpm run report:runnpm run bench:tui
Legacy shell scripts under scripts/ still work, but they are compatibility shims rather than the preferred control plane. The older package aliases run:benchmark:query-set:shared and run:benchmark:query-set:sharded also still work as compatibility aliases, but the preferred operator-facing names now say explicitly that these paths use a shared BM25 daemon. The two intentional shell-level implementation boundaries that remain are benchmark-scoped setup scripts and the thin BM25 JVM bootstrap script used by the typed BM25 launch helpers.
Repo layout
src/orchestration/— active benchmark-first launch/setup/tuning control-plane entrypointssrc/legacy/— compatibility-only TypeScript entrypoints that are still intentionally preserved for historical low-level contractssrc/runtime/— shared runtime primitives such as prompt construction, artifact-path helpers, and isolated agent-dir handlingsrc/benchmarks/— typed benchmark definitions, registry helpers, run-manifest snapshot logicsrc/wrappers/— downstream summarize/eval/report wrapper entrypoints and precedence helperssrc/operator/— monitor, supervisor, TUI, and benchctl operator surfacessrc/evaluation/— retrieval and judge evaluation backends plus metric helperssrc/report/— Markdown report generation and report-data helperssrc/bm25/— BM25 subprocess startup and local transport helperssrc/pi-search/—pisearch extension and helpersscripts/— compatibility wrappers plus benchmark-scoped setup implementations and the thin BM25 JVM bootstrap scriptjvm/— JVM BM25 RPC serverdata/<dataset>/...— benchmark-scoped local dataset assetsindexes/<index-name>/— benchmark-scoped local Lucene indexesvendor/anserini/— Anserini fatjar prepared locally by setup scriptsruns/— benchmark run outputsevals/— evaluation outputsnotes/— local notes and experiment writeups
Read more
- paper
- Project page
- Running benchmarks
- Evaluation semantics
- Reproducibility
- Adding a benchmark
- BM25 backend interface
- Released Run on BrowseComp-Plus (Canary to prevent leakage:
piserini-a-minimal-search-agent)
Citation
bibtex@misc{hsu2026rethinkingagenticsearchpiserini, title = {Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?}, author = {Tz-Huan Hsu and Jheng-Hong Yang and Jimmy Lin}, year = {2026}, eprint = {2605.10848}, archivePrefix = {arXiv}, primaryClass = {cs.IR}, url = {https://arxiv.org/abs/2605.10848} }
Notes
- Runs snapshot their resolved benchmark condition into
<run>/benchmark_manifest_snapshot.json. - Reports now prefer structured run setup metadata from
<run>/run_setup.jsonand fall back to legacy launcher logs when needed. - Do not track generated benchmark content under
data/,indexes/,runs/,evals/, orscratch/.
Contact
Jheng-Hong (Matt) YANG: jhyang@stencilzeit.com
License
MIT
Contributors
Showing top 2 contributors by commit count.
