GitPedia

Pi serini

A Minimalistic Search Agent

From justram·Updated June 5, 2026·View on GitHub·

**pi serini** is A Minimalistic Search Agent The project is written primarily in TypeScript, distributed under the MIT License license, first published in 2026. Key topics include: agentic-search, deep-research, deep-research-agent.

Latest release: v0.3.0
May 12, 2026View Changelog →
<div align="center">

pi-serini

A reusable, reproducible pi search-agent workspace

License
Pi package

There are many search agents, but this one is

yours.

</div>

pi-serini is a reusable, benchmark-driven pi search-agent workspace for index-driven BM25 retrieval, agentic search, and benchmark-aware evaluation.

Current release status: v0.3.0 supports index-driven benchmark and agentic search workflows for MS MARCO v1 Passage (dl19, dl20) and BrowseComp-Plus, with benchmark-template included as a tiny local end-to-end demo benchmark. This release tracks the Pi package namespace migration to @earendil-works/* and requires pi 0.74.0 or newer.

The repo is now manifest-driven rather than BrowseComp-Plus-only:

  • benchmark defaults live in typed registry entries under src/benchmarks/
  • each run snapshots its resolved benchmark condition into benchmark_manifest_snapshot.json
  • active Node.js/TypeScript control-plane entrypoints live under src/orchestration/
  • compatibility-only TypeScript entrypoints live under src/legacy/
  • shared runtime primitives live under src/runtime/
  • legacy shell scripts remain available as compatibility shims

BrowseComp-Plus remains the default benchmark for reproducibility, but the same control plane now also supports MS MARCO v1 Passage and a tiny local benchmark-template demo benchmark.

Supported benchmarks

  • browsecomp-plus — default packaged benchmark with query sets q9, q100, q300, and qfull
  • msmarco-v1-passage — index-driven MS MARCO v1 passage benchmark with query sets dl19 and dl20
  • benchmark-template — tiny local end-to-end demo benchmark for development and validation

To inspect the registered benchmark catalog from the CLI:

bash
npm run bench -- benchmarks

Requirements

  • pi 0.74.0 or newer installed and logged in
  • Node.js with npx
  • Java 21+
  • python3
  • uv
  • curl or wget

Supported developer environments:

  • macOS
  • Linux

If Java is installed in a non-standard location, set JAVA_HOME explicitly before running setup or benchmark commands.

Pi packages now live under the @earendil-works/* npm namespace. This repo depends on @earendil-works/pi-coding-agent and @earendil-works/pi-tui; use that namespace for any local extension or SDK imports rather than the retired @mariozechner/* package names.

Quickstart

1. Set up benchmark assets

BrowseComp-Plus base assets:

bash
npm run setup:browsecomp-plus

BrowseComp-Plus decrypted ground truth is a separate opt-in step and requires an explicit decryption secret from the operator:

bash
BROWSECOMP_PLUS_CANARY='...your secret...' \ npm run setup:ground-truth:browsecomp-plus

MS MARCO v1 Passage:

bash
npm run setup:msmarco-v1-passage

Tiny local demo benchmark:

bash
npm run setup:benchmark -- --benchmark benchmark-template

2. Run a benchmark query set

Use the same generic command surface for every benchmark; only BENCHMARK and QUERY_SET change.

Default single-process launch:

bash
BENCHMARK=msmarco-v1-passage \ QUERY_SET=dl19 \ MODEL=openai-codex/gpt-5.4-mini \ npm run run:benchmark:query-set

Shared BM25 daemon (preferred package alias):

bash
BENCHMARK=browsecomp-plus \ QUERY_SET=q9 \ MODEL=openai-codex/gpt-5.4-mini \ PI_BM25_RPC_PORT=50455 \ npm run run:benchmark:query-set:shared-bm25

Sharded shared-daemon launch (preferred package alias):

bash
BENCHMARK=browsecomp-plus \ QUERY_SET=q100 \ SHARD_COUNT=4 \ MODEL=openai-codex/gpt-5.4-mini \ npm run run:benchmark:query-set:sharded-shared-bm25

Tiny local demo run:

bash
BENCHMARK=benchmark-template \ QUERY_SET=test \ MODEL=openai-codex/gpt-5.4-mini \ npm run run:benchmark:query-set

BM25 tuning during benchmark runs

Benchmark runs accept BM25 tuning through environment variables:

  • PI_BM25_K1 — default 0.9
  • PI_BM25_B — default 0.4
  • PI_BM25_THREADS — default 1

Example with explicit BM25 tuning:

bash
PI_BM25_K1=0.82 \ PI_BM25_B=0.68 \ BENCHMARK=msmarco-v1-passage \ QUERY_SET=dl19 \ MODEL=openai-codex/gpt-5.4-mini \ npm run run:benchmark:query-set

Example with shared BM25 daemon tuning:

bash
PI_BM25_K1=0.82 \ PI_BM25_B=0.68 \ PI_BM25_THREADS=4 \ BENCHMARK=browsecomp-plus \ QUERY_SET=q9 \ MODEL=openai-codex/gpt-5.4-mini \ npm run run:benchmark:query-set:shared-bm25

Suggested BrowseComp-Plus parameters:

  • PI_BM25_K1=25
  • PI_BM25_B=1

Example:

bash
PI_BM25_K1=25 \ PI_BM25_B=1 \ BENCHMARK=browsecomp-plus \ QUERY_SET=q9 \ MODEL=openai-codex/gpt-5.4-mini \ npm run run:benchmark:query-set:shared-bm25

For systematic BM25 parameter search rather than manual overrides, use:

bash
npm run tune:bm25

3. Summarize and evaluate a run

Summarize:

bash
RUN_DIR=runs/<run> npm run summarize:run

Retrieval evaluation:

bash
RUN_DIR=runs/<run> npm run evaluate:retrieval

Judge evaluation:

bash
INPUT_DIR=runs/<run> npm run evaluate:run

Generate a Markdown report:

bash
RUN_DIR=runs/<run> npm run report:run

benchctl operator workflow

Use the direct run:benchmark:* entrypoints when you want low-level benchmark execution with explicit benchmark and query-set control.

Use benchctl when you want the higher-level operator surface for:

  • listing registered benchmarks and managed presets
  • launching supervisor-managed runs
  • checking run status and managed process state
  • monitoring runs in the live terminal dashboard

Common commands:

List registered benchmarks and presets:

bash
npm run bench -- benchmarks

Launch a managed shared run:

bash
npm run bench -- run --preset q9_shared --model openai-codex/gpt-5.4-mini

Launch a managed sharded run:

bash
npm run bench -- run --preset browsecomp-plus/qfull_sharded --model openai-codex/gpt-5.4-mini --shards 8

Inspect current run status:

bash
npm run bench:status npm run bench:managed

Open the live operator TUI:

bash
npm run bench:tui

For the full managed-run and monitoring workflow, see Running benchmarks.

Preferred entrypoints

Preferred operator-facing commands are the Node-first package scripts:

  • npm run setup:benchmark
  • npm run run:benchmark:query-set
  • npm run run:benchmark:query-set:shared-bm25
  • npm run run:benchmark:query-set:sharded-shared-bm25
  • npm run summarize:run
  • npm run evaluate:retrieval
  • npm run evaluate:run
  • npm run report:run
  • npm run bench:tui

Legacy shell scripts under scripts/ still work, but they are compatibility shims rather than the preferred control plane. The older package aliases run:benchmark:query-set:shared and run:benchmark:query-set:sharded also still work as compatibility aliases, but the preferred operator-facing names now say explicitly that these paths use a shared BM25 daemon. The two intentional shell-level implementation boundaries that remain are benchmark-scoped setup scripts and the thin BM25 JVM bootstrap script used by the typed BM25 launch helpers.

Repo layout

  • src/orchestration/ — active benchmark-first launch/setup/tuning control-plane entrypoints
  • src/legacy/ — compatibility-only TypeScript entrypoints that are still intentionally preserved for historical low-level contracts
  • src/runtime/ — shared runtime primitives such as prompt construction, artifact-path helpers, and isolated agent-dir handling
  • src/benchmarks/ — typed benchmark definitions, registry helpers, run-manifest snapshot logic
  • src/wrappers/ — downstream summarize/eval/report wrapper entrypoints and precedence helpers
  • src/operator/ — monitor, supervisor, TUI, and benchctl operator surfaces
  • src/evaluation/ — retrieval and judge evaluation backends plus metric helpers
  • src/report/ — Markdown report generation and report-data helpers
  • src/bm25/ — BM25 subprocess startup and local transport helpers
  • src/pi-search/pi search extension and helpers
  • scripts/ — compatibility wrappers plus benchmark-scoped setup implementations and the thin BM25 JVM bootstrap script
  • jvm/ — JVM BM25 RPC server
  • data/<dataset>/... — benchmark-scoped local dataset assets
  • indexes/<index-name>/ — benchmark-scoped local Lucene indexes
  • vendor/anserini/ — Anserini fatjar prepared locally by setup scripts
  • runs/ — benchmark run outputs
  • evals/ — evaluation outputs
  • notes/ — local notes and experiment writeups

Read more

Citation

bibtex
@misc{hsu2026rethinkingagenticsearchpiserini, title = {Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?}, author = {Tz-Huan Hsu and Jheng-Hong Yang and Jimmy Lin}, year = {2026}, eprint = {2605.10848}, archivePrefix = {arXiv}, primaryClass = {cs.IR}, url = {https://arxiv.org/abs/2605.10848} }

Notes

  • Runs snapshot their resolved benchmark condition into <run>/benchmark_manifest_snapshot.json.
  • Reports now prefer structured run setup metadata from <run>/run_setup.json and fall back to legacy launcher logs when needed.
  • Do not track generated benchmark content under data/, indexes/, runs/, evals/, or scratch/.

Contact

Jheng-Hong (Matt) YANG: jhyang@stencilzeit.com

License

MIT

Contributors

Showing top 2 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from justram/pi-serini via the GitHub API.Last fetched: 6/19/2026