fastxyz/skill-optimizer

Benchmark, evaluate, and optimize skills to ensure reliable performance across all LLMs

2 Releases

Latest: 2mo ago

v1.1.01.1.0Latest

damienen·2mo ago·April 18, 2026

✨ Added

Prompt surface — benchmark and optimize prompt templates, Claude Code skills, and agent instructions. Discovers phases and capabilities from markdown, evaluates output quality with content-based criteria (required sections, format patterns, forbidden keywords, code blocks).
Codex auth — direct OpenAI model runs can use browser-login tokens or a static `OPENAI_API_KEY` stored by Codex (`~/.codex/auth.json`) instead of requiring an env var. Set `benchmark.authMode: "codex"` and `"format": "openai"` with `openai/<model>` IDs.
SKILL folder — bundled AI-agent guidance (`SKILL/SKILL.md`) so agents can use skill-optimizer reliably without extra setup.
Stable task IDs — IDs are now a SHA-1 hash of action names (SDK/CLI/MCP) or prompt text (prompt surface), so `--task <id>` filters work across regenerations (fixes [#17](https://github.com/fastxyz/skill-optimizer/issues/17)).
Optimizer loop diagram — README includes a visual workflow diagram.

🐛 Fixed

Anthropic tool names — dotted tool names (e.g. `auth.status`) are now sanitized to `auth_status` before sending to the Anthropic API and mapped back in responses. Fixes hard failures on tool-calling benchmarks against `anthropic/` models.
Prompt eval on model error — prompt evaluator no longer runs when the model call itself failed; `toolPrecision` is now correctly set to `1.0` for prompt tasks (no tool calls = vacuously perfect precision).
Config path — running without `--config` now looks for `.skill-optimizer/skill-optimizer.json`, matching what `init` scaffolds.
Format/prefix validation — `validate` now errors when `benchmark.format: "openai"` is paired with non-`openai/` model IDs, and vice versa for `anthropic/`.
Codex static key routing — a plain `OPENAI_API_KEY` in `~/.codex/auth.json` now correctly routes to the direct OpenAI transport instead of the JWT-only Codex transport. A malformed `access_token` (non-JWT) no longer shadows a valid static key fallback.
Model IDs — OpenRouter slugs preserve dots (`openrouter/anthropic/claude-sonnet-4.6`); dot→hyphen rewrite applies only to `anthropic/` direct-API IDs; `openai/` slugs (e.g. `gpt-5.4`) are exempt.
Provider prefix is stripped before sending model IDs to `anthropic/` and `openai/` direct APIs.
Prompt-surface benchmarks no longer hard-fail on coverage violations; coverage is informational.
+ 1 more

💥 Breaking changes

`CodeModeConfig` → `SdkSurfaceConfig`
`McpModeConfig` → `McpSurfaceConfig`
`ExpectedTool` → `ExpectedAction`
`ToolMatch` → `ActionMatch`
`LEGACY_PROJECT_CONFIG_NAME` → hard-code `".skill-optimizer/skill-optimizer.json"`
`toLegacyOptimizeManifest` → removed
`SurfaceSnapshotArg` → removed
`TaskResult` fields: `toolMatches` → `actionMatches`, `hallucinatedCalls` → `hallucinatedActions`, `unnecessaryCalls` → `unnecessaryActions`. Re-run benchmark to regenerate report files.
+ 3 more

v1.0.01.0.0

damienen·2mo ago·April 14, 2026