fastxyz/skill-optimizer
Benchmark, evaluate, and optimize skills to ensure reliable performance across all LLMs
2 Releases
Latest: 2mo ago
v1.1.01.1.0Latest
✨ Added
- Prompt surface — benchmark and optimize prompt templates, Claude Code skills, and agent instructions. Discovers phases and capabilities from markdown, evaluates output quality with content-based criteria (required sections, format patterns, forbidden keywords, code blocks).
- Codex auth — direct OpenAI model runs can use browser-login tokens or a static `OPENAI_API_KEY` stored by Codex (`~/.codex/auth.json`) instead of requiring an env var. Set `benchmark.authMode: "codex"` and `"format": "openai"` with `openai/<model>` IDs.
- SKILL folder — bundled AI-agent guidance (`SKILL/SKILL.md`) so agents can use skill-optimizer reliably without extra setup.
- Stable task IDs — IDs are now a SHA-1 hash of action names (SDK/CLI/MCP) or prompt text (prompt surface), so `--task <id>` filters work across regenerations (fixes [#17](https://github.com/fastxyz/skill-optimizer/issues/17)).
- Optimizer loop diagram — README includes a visual workflow diagram.
🐛 Fixed
- Anthropic tool names — dotted tool names (e.g. `auth.status`) are now sanitized to `auth_status` before sending to the Anthropic API and mapped back in responses. Fixes hard failures on tool-calling benchmarks against `anthropic/` models.
- Prompt eval on model error — prompt evaluator no longer runs when the model call itself failed; `toolPrecision` is now correctly set to `1.0` for prompt tasks (no tool calls = vacuously perfect precision).
- Config path — running without `--config` now looks for `.skill-optimizer/skill-optimizer.json`, matching what `init` scaffolds.
- Format/prefix validation — `validate` now errors when `benchmark.format: "openai"` is paired with non-`openai/` model IDs, and vice versa for `anthropic/`.
- Codex static key routing — a plain `OPENAI_API_KEY` in `~/.codex/auth.json` now correctly routes to the direct OpenAI transport instead of the JWT-only Codex transport. A malformed `access_token` (non-JWT) no longer shadows a valid static key fallback.
- Model IDs — OpenRouter slugs preserve dots (`openrouter/anthropic/claude-sonnet-4.6`); dot→hyphen rewrite applies only to `anthropic/` direct-API IDs; `openai/` slugs (e.g. `gpt-5.4`) are exempt.
- Provider prefix is stripped before sending model IDs to `anthropic/` and `openai/` direct APIs.
- Prompt-surface benchmarks no longer hard-fail on coverage violations; coverage is informational.
- + 1 more
💥 Breaking changes
- `CodeModeConfig` → `SdkSurfaceConfig`
- `McpModeConfig` → `McpSurfaceConfig`
- `ExpectedTool` → `ExpectedAction`
- `ToolMatch` → `ActionMatch`
- `LEGACY_PROJECT_CONFIG_NAME` → hard-code `".skill-optimizer/skill-optimizer.json"`
- `toLegacyOptimizeManifest` → removed
- `SurfaceSnapshotArg` → removed
- `TaskResult` fields: `toolMatches` → `actionMatches`, `hallucinatedCalls` → `hallucinatedActions`, `unnecessaryCalls` → `unnecessaryActions`. Re-run benchmark to regenerate report files.
- + 3 more
v1.0.01.0.0
