Home/justram/pi-serini/Changelog

justram/pi-serini

A Minimalistic Search Agent

11 Releases

Latest: 1mo ago

v0.3.0Latest

justram·1mo ago·May 12, 2026

GitHub

✨ Added

Added a BrowseComp-Plus external-run adapter at `src/adapters/import_search_jsonl_run.ts` and the package script `npm run adapt:search-jsonl-run` for normalizing one-JSON-object-per-line search-session artifacts into native run directories.
Added focused coverage for the external-run importer and response-confidence calibration helpers.
Added README links from @ricky42613 for the Pi-Serini project page and released BrowseComp-Plus run datasets on Hugging Face.

📋 Changed

Migrated Pi package dependencies and source imports from `@mariozechner/*` to `@earendil-works/*`.
Updated `@earendil-works/pi-coding-agent` and `@earendil-works/pi-tui` to `^0.74.0` and refreshed `package-lock.json`.
Replaced Ajv-backed TypeBox validation with TypeBox v1 native compiler APIs while preserving protocol validation behavior and structured error metadata.
Updated judge-evaluation calibration to use response self-reported confidence against gold-answer correctness.

🐛 Fixed

Fixed benchmark launches against the current Pi CLI by using the explicit-extension-compatible `--no-builtin-tools` behavior.
Fixed shared-BM25 liveness detection for root-relative log paths.
Fixed sharded shared-BM25 merge metadata handling so merged runs synthesize canonical merged-level metadata instead of failing on shard-local metadata differences.
Fixed calibration computation to include a final partial confidence bin.

📦 Upgrade notes

Install/update Pi to `0.74.0` or newer.
Use `@earendil-works/pi-coding-agent` and `@earendil-works/pi-tui` in extension or SDK imports; the old `@mariozechner/*` Pi package names are retired upstream.

v0.2.3

justram·2mo ago·April 3, 2026

GitHub

✨ Added

Added explicit document-visibility tiers for benchmark runs and downstream analysis: `surfaced_docids` for the full system-exposed retrieval pool, `previewed_docids` for result-page items actually shown to the model, and `agent_docids` for the union of documents the agent opened or cited. The benchmark runner, judge evaluation, run summarization, and Markdown reports now surface these tiers so retrieval diagnostics can distinguish hidden top-k availability from model-visible evidence and agent behavior.

📋 Changed

Reconnected BM25 helper-side preview rendering to the active `pi-search` Anserini adapter, so `search(...)` once again hydrates top BM25 hits with cheap title/excerpt previews instead of showing only `docid`, score, and the fallback “No snippet available from this backend” message. This restores meaningful result-page visibility for the agent on the BM25 path without requiring extra `read_document(...)` calls just to understand top-ranked hits.
Moved the repo-local Anserini integration stack from `src/bm25/` to `src/search-providers/anserini/`, keeping the package-owned `pi-search` adapter surface separate from provider-owned transport/process construction and updating docs/tests to reflect the clearer provider boundary. (commit `82e51cd`)
Changed retrieval-evaluation/report wording from the ambiguous legacy "agent-set" framing toward explicit surfaced/previewed/agent-behavior semantics while retaining compatibility aliases for older run artifacts and downstream consumers.

v0.2.2

justram·2mo ago·March 25, 2026

GitHub

✨ Added

Added small stable machine-readable metadata to `pi-search` protocol errors:
`code`
`toolName`
`target`
`schemaName`
`fieldPath`
Added benchmark-harness artifact support for structured `pi-search` failure metadata when recoverable tool failures include it.
Added a whole-tree regression guard that fails if package-owned `src/pi-search/` modules import repo-owned `src/` layers.

📋 Changed

Changed `src/pi-search/extension.ts` to remain the package-owned extension registration layer, while keeping the repo-local BM25 composition seam in `src/extensions/pi_search.ts`.
Changed shared JSONL ownership to `src/runtime/jsonl.ts` so it no longer sits under fake `pi-search` ownership.
Changed `src/pi-search/protocol/parse.ts` to attach structured metadata for malformed JSON and schema-invalid payloads.
Changed the package-owned Anserini adapter seam to depend on a `pi-search`-owned narrow helper transport interface instead of the repo-owned BM25 RPC client type.
Changed the package-owned prompt dump env gate from `PI_BM25_DUMP_PROMPTS` to `PI_SEARCH_DUMP_PROMPTS`.

🐛 Fixed

Fixed maintainer-facing docs to describe the current package-owned `pi-search` boundary and the thin repo-local BM25 wrapper honestly.

📦 Scope

`v0.2.2` does not introduce a new backend kind or a new benchmark architecture milestone.
This release is specifically about making the existing `pi-search` ownership boundary more honest and more durable by:
keeping structured protocol and harness metadata intentionally narrow and stable
reducing remaining cross-boundary coupling around the Anserini adapter seam
preventing future repo-owned import leaks from creeping back into package-owned `pi-search`
keeping maintainer docs aligned with the current in-repo package boundary

v0.2.1

justram·2mo ago·March 24, 2026

GitHub

✨ Added

Added a generic `http-json` searcher adapter under `src/pi-search/searcher/adapters/http_json/adapter.ts`.
Added explicit `pi-search` extension config support for HTTP-backed backends alongside the existing Anserini BM25 and mock adapters.
Added benchmark-harness regression coverage for HTTP-backed `pi-search` behavior across the full tool surface:
`search`
`read_search_results`
`read_document`

📋 Changed

Changed the top-level `pi-search` extension surface to be backend-agnostic in tool labels, descriptions, spill-directory naming, and runtime log prefixes.
Changed `docs/pi-search-contract.md` to reflect the current `searcher/` subsystem layout and the benchmark-validated HTTP-backed adapter path.

🐛 Fixed

Fixed HTTP-backed `pi-search` response handling so successful `2xx` responses are parsed through the shared searcher-contract parsers.
Fixed HTTP-backed validation to preserve distinct failure classes for:
malformed JSON
schema-invalid payloads
backend execution failures

📦 Scope

`v0.2.1` does not broaden the product into document-ingestion-first indexing.
This release is specifically about strengthening the standalone `pi-search` contract by:
adding a real external-service-shaped backend path
keeping the extension surface honest and backend-agnostic
proving through benchmark-harness regressions that `pi-serini` validates both successful and recoverable `pi-search` tool behavior over that path

v0.2.0

justram·2mo ago·March 24, 2026

GitHub

✨ Added

Added a dedicated `pi-search` protocol contract layer under `src/pi-search/protocol/`.
Added TypeBox-authored protocol schemas, a shared Ajv runtime, explicit protocol error types, schema-backed payload parsers, and structured contract helpers for benchmark-harness consumers.
Added focused regression coverage for:
malformed `pi-search` protocol payloads
extracted helper/spill ownership modules
repair-friendly tool failure messaging
contract-detail extraction helpers
benchmark-runner handling of recoverable `pi-search` extension failures
+ 1 more

📋 Changed

Changed `src/pi-search/extension.ts` into a composition root over extracted `pi-search` subsystems:
protocol validation
helper runtime ownership
prompt policy
spill management
cached search state
tool handlers
Changed benchmark execution to consume `pi-search`-owned structured result details instead of re-deriving active `pi-search` semantics from rendered tool output.
+ 1 more

🐛 Fixed

Fixed the initial `pi-search` tool failure paths to return more repairable agent-loop feedback for:
empty `search` queries
unknown `read_search_results.search_id` values
missing `read_document` docids
Fixed the ownership boundary so standalone extension contract definitions no longer live implicitly inside the benchmark harness.

📦 Scope

`v0.2.0` is not a repo-wide JSON validation rewrite.
This release is specifically about:
`pi-search` owning its standalone extension contract
`pi-serini` consuming and benchmarking that contract correctly
making extension failures more explicit and measurable inside benchmark runs

v0.1.5

justram·2mo ago·March 24, 2026

GitHub

🐛 Fixed

Added `ajv` as the runtime validation backend while keeping `@sinclair/typebox` as the schema authoring layer.
Hardened untrusted process-boundary JSON parsing for:
BM25 helper RPC responses
BM25 helper ping handshake payloads
shared BM25 server readiness payloads
pi JSON event lines consumed by benchmark and judge runners
extension-side BM25 `search`, `render_search_results`, and `read_document` responses
Replaced unchecked `JSON.parse(...) as T` casts on those boundaries with schema-backed validation and clearer error messages.
+ 1 more

📦 Scope

`v0.1.5` does not change benchmark semantics or retrieval behavior. It makes runtime failures at JSON process boundaries more explicit and safer to diagnose.

v0.1.4

justram·2mo ago·March 23, 2026

GitHub

🐛 Fixed

Moved runtime-required packages from `devDependencies` to `dependencies` in `package.json`:
`@mariozechner/pi-coding-agent`
`@sinclair/typebox`
`tsx`
Refreshed `package-lock.json` with `npm install` so the lockfile matches the corrected runtime dependency split.

📦 Scope

`v0.1.4` does not change benchmark or search behavior. It fixes install/runtime dependency classification for the repo's operator-facing commands and runtime extension surface.

v0.1.3

justram·2mo ago·March 23, 2026

GitHub

✨ Added

Added explicit BM25 tuning documentation to `README.md` and `docs/running-benchmarks.md`.
Documented the benchmark-run environment variables:
`PI_BM25_K1`
`PI_BM25_B`
`PI_BM25_THREADS`
Added runnable examples for single-process, shared-daemon, and sharded shared-daemon benchmark runs with manual BM25 overrides.
Added the suggested BrowseComp-Plus BM25 parameters:
`k1 = 25`
+ 1 more

📦 Scope

`v0.1.3` does not change runtime product behavior. It makes BM25 tuning during benchmark execution much more explicit for operators.

v0.1.2

justram·2mo ago·March 23, 2026

GitHub

✨ Added

Added a dedicated `benchctl` operator workflow section to `README.md`.
Documented how to use `benchctl` for:
listing registered benchmarks and managed presets
launching supervisor-managed runs
inspecting run and managed process status
opening the live operator TUI

🐛 Fixed

Replaced the non-portable local `@mariozechner/pi-tui` `file:` dependency with the published npm package.
Refreshed `package-lock.json` with `npm install` so installs no longer depend on a local sibling checkout layout.

📦 Scope

`v0.1.2` does not change runtime product behavior. It improves README operator-workflow discoverability and fixes standalone installation portability after `v0.1.1`.

v0.1.1

justram·2mo ago·March 23, 2026

GitHub

🐛 Fixed

Hardened the detached-process runtime test in `tests/runtime_process.test.ts` to wait for actual stdout and stderr file contents before asserting success.
Eliminated an intermittent push-time test failure caused by treating a completion marker file as proof that detached stdout/stderr output had already flushed.

📦 Scope

`v0.1.1` does not change runtime product behavior. It is a test-stability patch release following `v0.1.0`.

v0.1.0

justram·2mo ago·March 23, 2026

GitHub

📦 Highlights

Index-driven benchmark and agentic search workflows for:
MS MARCO v1 Passage with `dl19` and `dl20`
BrowseComp-Plus with `q9`, `q100`, `q300`, and `qfull`
BM25-backed `pi` search extension with search and document-reading flows over Lucene indexes
Shared BM25 RPC execution with single-process, shared-daemon, and sharded shared-daemon launch modes
Benchmark-aware retrieval evaluation, judge evaluation, summarization, and Markdown reporting
Reproducible run manifests via per-run `benchmark_manifest_snapshot.json`

📦 Benchmarks

`browsecomp-plus` — default packaged benchmark
`msmarco-v1-passage` — MS MARCO v1 passage support for `dl19` and `dl20`
`benchmark-template` — tiny local end-to-end demo benchmark for development and validation

📦 Platform capabilities

Typed benchmark registry and benchmark-aware query/qrels/index resolution
Node.js/TypeScript-first orchestration entrypoints under `src/orchestration/`
Managed launch presets and operator surfaces under `src/operator/`
Internal and `trec_eval`-backed retrieval evaluation paths
Judge evaluation with benchmark-aware mode defaults and validation
BM25 comparison and tuning tooling

📦 Scope note

This release is intentionally index-driven: benchmark runs execute against prepared Lucene indexes.
Document-ingestion-first indexing workflows built around Anserini `IndexCollection` are planned next, but are not part of `v0.1.0`.

📦 Getting started

```bash
npm run setup:browsecomp-plus
npm run setup:msmarco-v1-passage
BENCHMARK=msmarco-v1-passage \
QUERY_SET=dl19 \
MODEL=openai-codex/gpt-5.4-mini \
npm run run:benchmark:query-set
```
+ 5 more

← Back to pi-serini wiki