justram/pi-serini
A Minimalistic Search Agent
11 Releases
Latest: 1mo ago
v0.3.0Latest
✨ Added
- Added a BrowseComp-Plus external-run adapter at `src/adapters/import_search_jsonl_run.ts` and the package script `npm run adapt:search-jsonl-run` for normalizing one-JSON-object-per-line search-session artifacts into native run directories.
- Added focused coverage for the external-run importer and response-confidence calibration helpers.
- Added README links from @ricky42613 for the Pi-Serini project page and released BrowseComp-Plus run datasets on Hugging Face.
📋 Changed
- Migrated Pi package dependencies and source imports from `@mariozechner/*` to `@earendil-works/*`.
- Updated `@earendil-works/pi-coding-agent` and `@earendil-works/pi-tui` to `^0.74.0` and refreshed `package-lock.json`.
- Replaced Ajv-backed TypeBox validation with TypeBox v1 native compiler APIs while preserving protocol validation behavior and structured error metadata.
- Updated judge-evaluation calibration to use response self-reported confidence against gold-answer correctness.
🐛 Fixed
- Fixed benchmark launches against the current Pi CLI by using the explicit-extension-compatible `--no-builtin-tools` behavior.
- Fixed shared-BM25 liveness detection for root-relative log paths.
- Fixed sharded shared-BM25 merge metadata handling so merged runs synthesize canonical merged-level metadata instead of failing on shard-local metadata differences.
- Fixed calibration computation to include a final partial confidence bin.
📦 Upgrade notes
- Install/update Pi to `0.74.0` or newer.
- Use `@earendil-works/pi-coding-agent` and `@earendil-works/pi-tui` in extension or SDK imports; the old `@mariozechner/*` Pi package names are retired upstream.
v0.2.3
✨ Added
- Added explicit document-visibility tiers for benchmark runs and downstream analysis: `surfaced_docids` for the full system-exposed retrieval pool, `previewed_docids` for result-page items actually shown to the model, and `agent_docids` for the union of documents the agent opened or cited. The benchmark runner, judge evaluation, run summarization, and Markdown reports now surface these tiers so retrieval diagnostics can distinguish hidden top-k availability from model-visible evidence and agent behavior.
📋 Changed
- Reconnected BM25 helper-side preview rendering to the active `pi-search` Anserini adapter, so `search(...)` once again hydrates top BM25 hits with cheap title/excerpt previews instead of showing only `docid`, score, and the fallback “No snippet available from this backend” message. This restores meaningful result-page visibility for the agent on the BM25 path without requiring extra `read_document(...)` calls just to understand top-ranked hits.
- Moved the repo-local Anserini integration stack from `src/bm25/` to `src/search-providers/anserini/`, keeping the package-owned `pi-search` adapter surface separate from provider-owned transport/process construction and updating docs/tests to reflect the clearer provider boundary. (commit `82e51cd`)
- Changed retrieval-evaluation/report wording from the ambiguous legacy "agent-set" framing toward explicit surfaced/previewed/agent-behavior semantics while retaining compatibility aliases for older run artifacts and downstream consumers.
v0.2.2
✨ Added
- Added small stable machine-readable metadata to `pi-search` protocol errors:
- `code`
- `toolName`
- `target`
- `schemaName`
- `fieldPath`
- Added benchmark-harness artifact support for structured `pi-search` failure metadata when recoverable tool failures include it.
- Added a whole-tree regression guard that fails if package-owned `src/pi-search/` modules import repo-owned `src/` layers.
📋 Changed
- Changed `src/pi-search/extension.ts` to remain the package-owned extension registration layer, while keeping the repo-local BM25 composition seam in `src/extensions/pi_search.ts`.
- Changed shared JSONL ownership to `src/runtime/jsonl.ts` so it no longer sits under fake `pi-search` ownership.
- Changed `src/pi-search/protocol/parse.ts` to attach structured metadata for malformed JSON and schema-invalid payloads.
- Changed the package-owned Anserini adapter seam to depend on a `pi-search`-owned narrow helper transport interface instead of the repo-owned BM25 RPC client type.
- Changed the package-owned prompt dump env gate from `PI_BM25_DUMP_PROMPTS` to `PI_SEARCH_DUMP_PROMPTS`.
🐛 Fixed
- Fixed maintainer-facing docs to describe the current package-owned `pi-search` boundary and the thin repo-local BM25 wrapper honestly.
📦 Scope
- `v0.2.2` does not introduce a new backend kind or a new benchmark architecture milestone.
- This release is specifically about making the existing `pi-search` ownership boundary more honest and more durable by:
- keeping structured protocol and harness metadata intentionally narrow and stable
- reducing remaining cross-boundary coupling around the Anserini adapter seam
- preventing future repo-owned import leaks from creeping back into package-owned `pi-search`
- keeping maintainer docs aligned with the current in-repo package boundary
v0.2.1
✨ Added
- Added a generic `http-json` searcher adapter under `src/pi-search/searcher/adapters/http_json/adapter.ts`.
- Added explicit `pi-search` extension config support for HTTP-backed backends alongside the existing Anserini BM25 and mock adapters.
- Added benchmark-harness regression coverage for HTTP-backed `pi-search` behavior across the full tool surface:
- `search`
- `read_search_results`
- `read_document`
📋 Changed
- Changed the top-level `pi-search` extension surface to be backend-agnostic in tool labels, descriptions, spill-directory naming, and runtime log prefixes.
- Changed `docs/pi-search-contract.md` to reflect the current `searcher/` subsystem layout and the benchmark-validated HTTP-backed adapter path.
🐛 Fixed
- Fixed HTTP-backed `pi-search` response handling so successful `2xx` responses are parsed through the shared searcher-contract parsers.
- Fixed HTTP-backed validation to preserve distinct failure classes for:
- malformed JSON
- schema-invalid payloads
- backend execution failures
📦 Scope
- `v0.2.1` does not broaden the product into document-ingestion-first indexing.
- This release is specifically about strengthening the standalone `pi-search` contract by:
- adding a real external-service-shaped backend path
- keeping the extension surface honest and backend-agnostic
- proving through benchmark-harness regressions that `pi-serini` validates both successful and recoverable `pi-search` tool behavior over that path
v0.2.0
✨ Added
- Added a dedicated `pi-search` protocol contract layer under `src/pi-search/protocol/`.
- Added TypeBox-authored protocol schemas, a shared Ajv runtime, explicit protocol error types, schema-backed payload parsers, and structured contract helpers for benchmark-harness consumers.
- Added focused regression coverage for:
- malformed `pi-search` protocol payloads
- extracted helper/spill ownership modules
- repair-friendly tool failure messaging
- contract-detail extraction helpers
- benchmark-runner handling of recoverable `pi-search` extension failures
- + 1 more
📋 Changed
- Changed `src/pi-search/extension.ts` into a composition root over extracted `pi-search` subsystems:
- protocol validation
- helper runtime ownership
- prompt policy
- spill management
- cached search state
- tool handlers
- Changed benchmark execution to consume `pi-search`-owned structured result details instead of re-deriving active `pi-search` semantics from rendered tool output.
- + 1 more
🐛 Fixed
- Fixed the initial `pi-search` tool failure paths to return more repairable agent-loop feedback for:
- empty `search` queries
- unknown `read_search_results.search_id` values
- missing `read_document` docids
- Fixed the ownership boundary so standalone extension contract definitions no longer live implicitly inside the benchmark harness.
📦 Scope
- `v0.2.0` is not a repo-wide JSON validation rewrite.
- This release is specifically about:
- `pi-search` owning its standalone extension contract
- `pi-serini` consuming and benchmarking that contract correctly
- making extension failures more explicit and measurable inside benchmark runs
v0.1.5
🐛 Fixed
- Added `ajv` as the runtime validation backend while keeping `@sinclair/typebox` as the schema authoring layer.
- Hardened untrusted process-boundary JSON parsing for:
- BM25 helper RPC responses
- BM25 helper ping handshake payloads
- shared BM25 server readiness payloads
- pi JSON event lines consumed by benchmark and judge runners
- extension-side BM25 `search`, `render_search_results`, and `read_document` responses
- Replaced unchecked `JSON.parse(...) as T` casts on those boundaries with schema-backed validation and clearer error messages.
- + 1 more
📦 Scope
- `v0.1.5` does not change benchmark semantics or retrieval behavior. It makes runtime failures at JSON process boundaries more explicit and safer to diagnose.
v0.1.4
🐛 Fixed
- Moved runtime-required packages from `devDependencies` to `dependencies` in `package.json`:
- `@mariozechner/pi-coding-agent`
- `@sinclair/typebox`
- `tsx`
- Refreshed `package-lock.json` with `npm install` so the lockfile matches the corrected runtime dependency split.
📦 Scope
- `v0.1.4` does not change benchmark or search behavior. It fixes install/runtime dependency classification for the repo's operator-facing commands and runtime extension surface.
v0.1.3
✨ Added
- Added explicit BM25 tuning documentation to `README.md` and `docs/running-benchmarks.md`.
- Documented the benchmark-run environment variables:
- `PI_BM25_K1`
- `PI_BM25_B`
- `PI_BM25_THREADS`
- Added runnable examples for single-process, shared-daemon, and sharded shared-daemon benchmark runs with manual BM25 overrides.
- Added the suggested BrowseComp-Plus BM25 parameters:
- `k1 = 25`
- + 1 more
📦 Scope
- `v0.1.3` does not change runtime product behavior. It makes BM25 tuning during benchmark execution much more explicit for operators.
v0.1.2
✨ Added
- Added a dedicated `benchctl` operator workflow section to `README.md`.
- Documented how to use `benchctl` for:
- listing registered benchmarks and managed presets
- launching supervisor-managed runs
- inspecting run and managed process status
- opening the live operator TUI
🐛 Fixed
- Replaced the non-portable local `@mariozechner/pi-tui` `file:` dependency with the published npm package.
- Refreshed `package-lock.json` with `npm install` so installs no longer depend on a local sibling checkout layout.
📦 Scope
- `v0.1.2` does not change runtime product behavior. It improves README operator-workflow discoverability and fixes standalone installation portability after `v0.1.1`.
v0.1.1
🐛 Fixed
- Hardened the detached-process runtime test in `tests/runtime_process.test.ts` to wait for actual stdout and stderr file contents before asserting success.
- Eliminated an intermittent push-time test failure caused by treating a completion marker file as proof that detached stdout/stderr output had already flushed.
📦 Scope
- `v0.1.1` does not change runtime product behavior. It is a test-stability patch release following `v0.1.0`.
v0.1.0
📦 Highlights
- Index-driven benchmark and agentic search workflows for:
- MS MARCO v1 Passage with `dl19` and `dl20`
- BrowseComp-Plus with `q9`, `q100`, `q300`, and `qfull`
- BM25-backed `pi` search extension with search and document-reading flows over Lucene indexes
- Shared BM25 RPC execution with single-process, shared-daemon, and sharded shared-daemon launch modes
- Benchmark-aware retrieval evaluation, judge evaluation, summarization, and Markdown reporting
- Reproducible run manifests via per-run `benchmark_manifest_snapshot.json`
📦 Benchmarks
- `browsecomp-plus` — default packaged benchmark
- `msmarco-v1-passage` — MS MARCO v1 passage support for `dl19` and `dl20`
- `benchmark-template` — tiny local end-to-end demo benchmark for development and validation
📦 Platform capabilities
- Typed benchmark registry and benchmark-aware query/qrels/index resolution
- Node.js/TypeScript-first orchestration entrypoints under `src/orchestration/`
- Managed launch presets and operator surfaces under `src/operator/`
- Internal and `trec_eval`-backed retrieval evaluation paths
- Judge evaluation with benchmark-aware mode defaults and validation
- BM25 comparison and tuning tooling
📦 Scope note
- This release is intentionally index-driven: benchmark runs execute against prepared Lucene indexes.
- Document-ingestion-first indexing workflows built around Anserini `IndexCollection` are planned next, but are not part of `v0.1.0`.
📦 Getting started
- ```bash
- npm run setup:browsecomp-plus
- npm run setup:msmarco-v1-passage
- BENCHMARK=msmarco-v1-passage \
- QUERY_SET=dl19 \
- MODEL=openai-codex/gpt-5.4-mini \
- npm run run:benchmark:query-set
- ```
- + 5 more
