Kiln
Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.
A free app and open-source library to build better AI products. The project is written primarily in Python, distributed under the Other license, first published in 2024. It has gained significant community traction with 4,899 stars and 372 forks on GitHub. Key topics include: ai, chain-of-thought, collaboration, dataset-generation, evals.
What is Kiln?
Kiln is a workbench for the full AI development loop: evals, optimization, prompts, RAG, fine-tuning, synthetic data, agents, and tools - all working together. The desktop app lets your whole team contribute (PMs, subject-experts, and QA can rate outputs and add data without writing code). The MIT-licensed Python library ships the same tasks to production. Runs locally - bring your own API keys, or go fully offline with Ollama.
Highlights
Iterate, optimize, and collaborate
- 🖥️ Intuitive app - Easy-to-use apps for Mac, Windows, and Linux. One-click install.
- 📊 Eval Builder - Auto-generate evals (judge + synthetic eval dataset), and align to your preference in ~10 minutes.
- 🚀 Auto-Optimize - Automatically find the best way to run your AI task, optimizing prompt, model selection, tools, skills, subagents, parameters, and more.
- 💬 AI Assistant - Your AI data-science partner. Kiln Assistant proposes improvements, optimizes prompts, runs experiments, creates evals, and more.
- 🤝 Git-native collaboration - The app syncs to Git automatically — even for teammates who don't know what Git is.
Build & ship agents
- 🔍 RAG - Drag-and-drop docs (PDF, image, video, audio) to create a RAG. Auto-generated RAG evals from your own documents.
- 🤖 Subagents - Compose multi-agent hierarchies. Each runs in its own focused context window.
- 🪄 Synthetic Data Generation - Generate data for evals or fine-tuning in minutes.
- 🎛️ Fine-Tuning - Zero-code fine-tuning across 60+ models (Qwen, Llama, GPT, Gemini, …) on Fireworks, Together, and Vertex. Serverless deployment included.
- 🐍 Open Python library - Agents built in the app can be deployed to production. MIT open-source.
- 🧰 …and more - Tools & MCP, Skills, structured outputs, reasoning models, model library (190+ tested).
App Quickstart
Get started in minutes - one-click install.
Download Kiln Desktop for macOS, Windows, or Linux, then follow the 5-minute quickstart to run your first task.
Prefer to start in code? See the Python library quickstart.
Demo
Watch a 2-minute overview, or our end-to-end project demo (20 minutes).
Why Kiln?
Most AI tooling forces a tradeoff: a code-only framework that covers one slice (orchestration or evals or RAG), or a paid SaaS that locks in your data and can't be extended. Kiln is a free, local-first workbench where a single task and dataset flow through evals, prompt optimization, fine-tuning, RAG, agents, and synthetic data — all in one tool.
-
One dataset, every technique. Define a task once. Eval it, optimize the prompt, fine-tune a model, generate synthetic data, add RAG — all against the same dataset, with results that compound across stages.
-
Track every axis. Move fast. Don't regress. Keeping agents running well is hard — a prompt change quietly regresses behavior three steps downstream; a model upgrade improves five things and breaks two. Kiln tracks quality across every dimension you care about, so you iterate without breaking what already works.
<p align="center"> <img width="600" alt="Kiln optimization across iterations" src="https://github.com/user-attachments/assets/5517b33b-74dd-444a-9f40-6a9c6d8a1ffc"> </p> -
Optimization, not just evaluation. Other tools tell you how a prompt scores, but not how to fix it. Kiln's Auto-Optimize searches across hundreds of prompt mutations and models to find what works best for every eval dimension.
-
GUI for the whole team, library for engineers. Kiln's desktop app lets PMs rate outputs, SMEs add training examples, and QA flag regressions — without a terminal. Engineers ship the same tasks via an MIT-licensed Python library. Data scientists can use the library in notebooks and experiments.
-
Local-first. Most AI platforms are SaaS-only. Kiln runs entirely on your machine. Bring your own API keys, or go fully offline with Ollama. Your data never leaves your control. Team-sync is provided via Git infrastructure you already own.
-
190+ models tested across every provider. Skip the guesswork — we've tested every model's capabilities across all major providers. OpenAI, Anthropic, Gemini, Bedrock, Ollama, OpenRouter, Fireworks, Groq, any OpenAI-compatible endpoint, and more. Swap models with confidence.
Open-source Python Library
Build AI tasks in the app. Deploy with the open-source library. Same engine, same project files, no rewrite. The MIT-licensed kiln-ai library is the same library used in the app. Load Kiln projects, run tasks, build fine-tunes, work in notebooks, integrate Pandas/Polars dataframes, and more.
bashpip install kiln-ai
📚 Library docs · REST API · PyPI
Docs
Full docs at docs.kiln.tech. Common starting points:
- Quickstart — run your first task in 5 minutes
- Evals
- Auto-Optimize
- RAG
- Agents
- Fine-Tuning
- Python Library
- End-to-end project demo (20-min video)
Community
- Chat with the community on Discord.
- Subscribe to the newsletter for new features.
- File issues, request features, or open a discussion on GitHub.
Contributing
See CONTRIBUTING.md for development setup and contribution guidelines.
License & Trademarks
Kiln's core Python library and REST server are MIT-licensed. The desktop app is source-available, free to use, and built on the fair-code model — so Kiln stays free for individuals while remaining sustainable.
Datasets are open JSON. You own and control your datasets.
Kiln Pro is our service that adds the AI Assistant, Auto-Optimize, and the Eval Builder. It's opt-in, and the core Kiln app remains fully functional without it.
The Kiln name and logos are trademarks of Chesterfield Laboratories Inc.
Copyright 2024 — Chesterfield Laboratories Inc.
Contributors
Showing top 12 contributors by commit count.
