GitPedia
fastxyz

fastxyz/skill-optimizer

Benchmark, evaluate, and optimize skills to ensure reliable performance across all LLMs

2 Releases
Latest: 2mo ago
v1.1.01.1.0Latest
damienendamienen·2mo ago·April 18, 2026
GitHub

Added

  • Prompt surface — benchmark and optimize prompt templates, Claude Code skills, and agent instructions. Discovers phases and capabilities from markdown, evaluates output quality with content-based criteria (required sections, format patterns, forbidden keywords, code blocks).
  • Codex auth — direct OpenAI model runs can use browser-login tokens or a static `OPENAI_API_KEY` stored by Codex (`~/.codex/auth.json`) instead of requiring an env var. Set `benchmark.authMode: "codex"` and `"format": "openai"` with `openai/<model>` IDs.
  • SKILL folder — bundled AI-agent guidance (`SKILL/SKILL.md`) so agents can use skill-optimizer reliably without extra setup.
  • Stable task IDs — IDs are now a SHA-1 hash of action names (SDK/CLI/MCP) or prompt text (prompt surface), so `--task <id>` filters work across regenerations (fixes [#17](https://github.com/fastxyz/skill-optimizer/issues/17)).
  • Optimizer loop diagram — README includes a visual workflow diagram.

🐛 Fixed

  • Anthropic tool names — dotted tool names (e.g. `auth.status`) are now sanitized to `auth_status` before sending to the Anthropic API and mapped back in responses. Fixes hard failures on tool-calling benchmarks against `anthropic/` models.
  • Prompt eval on model error — prompt evaluator no longer runs when the model call itself failed; `toolPrecision` is now correctly set to `1.0` for prompt tasks (no tool calls = vacuously perfect precision).
  • Config path — running without `--config` now looks for `.skill-optimizer/skill-optimizer.json`, matching what `init` scaffolds.
  • Format/prefix validation — `validate` now errors when `benchmark.format: "openai"` is paired with non-`openai/` model IDs, and vice versa for `anthropic/`.
  • Codex static key routing — a plain `OPENAI_API_KEY` in `~/.codex/auth.json` now correctly routes to the direct OpenAI transport instead of the JWT-only Codex transport. A malformed `access_token` (non-JWT) no longer shadows a valid static key fallback.
  • Model IDs — OpenRouter slugs preserve dots (`openrouter/anthropic/claude-sonnet-4.6`); dot→hyphen rewrite applies only to `anthropic/` direct-API IDs; `openai/` slugs (e.g. `gpt-5.4`) are exempt.
  • Provider prefix is stripped before sending model IDs to `anthropic/` and `openai/` direct APIs.
  • Prompt-surface benchmarks no longer hard-fail on coverage violations; coverage is informational.
  • + 1 more

💥 Breaking changes

  • `CodeModeConfig` → `SdkSurfaceConfig`
  • `McpModeConfig` → `McpSurfaceConfig`
  • `ExpectedTool` → `ExpectedAction`
  • `ToolMatch` → `ActionMatch`
  • `LEGACY_PROJECT_CONFIG_NAME` → hard-code `".skill-optimizer/skill-optimizer.json"`
  • `toLegacyOptimizeManifest` → removed
  • `SurfaceSnapshotArg` → removed
  • `TaskResult` fields: `toolMatches` → `actionMatches`, `hallucinatedCalls` → `hallucinatedActions`, `unnecessaryCalls` → `unnecessaryActions`. Re-run benchmark to regenerate report files.
  • + 3 more
v1.0.01.0.0
damienendamienen·2mo ago·April 14, 2026
GitHub