OllamaMQ
High-performance Ollama & LM Studio proxy with per-user fair-share queuing, round-robin scheduling, and a real-time TUI dashboard. Built in Rust.
`ollamaMQ` is a high-performance, asynchronous message queue dispatcher and load balancer designed to sit in front of one or more [Ollama](https://ollama.ai/) or [LM Studio](https://lmstudio.ai/) API instances. It acts as a smart proxy that queues incoming requests from multiple users and dispatches them in parallel to multiple backends using a fair-share round-robin scheduler with least-connections load balancing. The project is written primarily in Rust, distributed under the MIT License license, first published in 2026. Key topics include: fair-share, llm-ops, message-queue, ollama, openai-compatible.
ollamaMQ
ollamaMQ is a high-performance, asynchronous message queue dispatcher and load balancer designed to sit in front of one or more Ollama or LM Studio API instances. It acts as a smart proxy that queues incoming requests from multiple users and dispatches them in parallel to multiple backends using a fair-share round-robin scheduler with least-connections load balancing.
๐ Features
- Multi-Backend Load Balancing: Distribute requests across multiple Ollama or LM Studio instances using a Least Connections + Round Robin strategy. Automatically detects backend API type (Ollama
/api/*vs OpenAI/v1/*) and routes each request to a compatible backend. - Model-Aware Routing: Automatically identifies the requested model from the request body and routes the request only to backends that have that specific model loaded. This prevents 404 errors when different models are distributed across multiple backends.
- Smart Model Matching: Robust matching that handles common variations like
:latesttags and case-insensitivity. For example, a request forllama3will correctly matchllama3:lateston the backend. - Parallel Processing: Unlike basic proxies,
ollamaMQcan process multiple requests simultaneously (one per available backend), significantly increasing throughput for multiple users. - Backend Health Checks: Automatically monitors backend status every 10 seconds. Probes for both API type (Ollama vs OpenAI) and the list of currently available models (via
/api/tagsand/v1/models). Offline instances are temporarily skipped and marked in the TUI. - Per-User Queuing: Each user (identified by the
X-User-IDheader) has their own FIFO queue. - Fair-Share Scheduling: Prevents any single user from monopolizing all available backends.
- Transparent Header Forwarding: Full support for all HTTP headers (including
X-User-ID) passed to and from the backend, ensuring compatibility with tools like Claude Code. - VIP & Boost Modes: Absolute priority (VIP) or increased frequency (Boost) for specific users.
- Real-Time TUI Dashboard: Monitor backend health, active requests, queue depths, and throughput in real-time.
- OpenAI Compatibility: Supports standard OpenAI-compatible endpoints.
- Async Architecture: Built on
tokioandaxumfor high concurrency.

๐ ๏ธ Installation
Ensure you have Rust (2024 edition or later) and Ollama installed.
Option 1: Install via Cargo (Recommended)
bashcargo install ollamaMQ
Option 2: From Source
-
Clone the repository:
bashgit clone https://github.com/Chleba/ollamaMQ.git cd ollamaMQ -
Build and install locally:
bashcargo install --path .
๐ Usage
Docker Installation
Using Docker Compose (Recommended)
- Ensure Docker and Docker Compose are installed.
- Start your local Ollama instance (defaulting to
localhost:11434). - Run:
bash
docker compose up -d
Using Docker CLI
First build the image from the local Dockerfile:
bashdocker build -t chlebon/ollamamq .
Then run the container:
bashdocker run -d \ --name ollamamq \ -p 11435:11435 \ --restart unless-stopped \ chlebon/ollamamq
Command Line Arguments
ollamaMQ supports several options to configure the proxy:
-p, --port <PORT>: Port to listen on (default:11435)-o, --backend-urls <URL1,URL2>: Comma-separated list of backend server URLs (Ollama, LM Studio, etc.) (default:http://localhost:11434)-t, --timeout <SECONDS>: Request timeout in seconds (default:300)--no-tui: Disable the interactive TUI dashboard (useful for Docker/CI)--allow-all-routes: Enable fallback proxy for non-standard endpoints-h, --help: Print help message-V, --version: Print version information
Example:
bashollamaMQ --port 8080 --ollama-urls http://10.0.0.1:11434,http://10.0.0.2:11434 --timeout 600
Docker Example:
bashdocker run -d \ --name ollamamq \ -p 8080:8080 \ chlebon/ollamamq --port 8080 --ollama-urls http://192.168.1.5:11434 --timeout 600
API Proxying
Point your LLM clients to the ollamaMQ port (11435) and include the X-User-ID header.
Supported Endpoints:
GET /health(Internal health check)GET /(Backend Status)POST /api/generatePOST /api/chatPOST /api/embedPOST /api/embeddingsGET /api/tagsPOST /api/showPOST /api/createPOST /api/copyDELETE /api/deletePOST /api/pullPOST /api/pushGET/HEAD/POST /api/blobs/{digest}GET /api/psGET /api/versionPOST /v1/chat/completions(OpenAI Compatible)POST /v1/completions(OpenAI Compatible)POST /v1/embeddings(OpenAI Compatible)GET /v1/models(OpenAI Compatible)GET /v1/models/{model}(OpenAI Compatible)
Example (cURL):
bashcurl -X POST http://localhost:11435/api/chat \ -H "X-User-ID: developer-1" \ -d '{ "model": "qwen3.5:35b", "messages": [{"role": "user", "content": "Explain quantum computing."}], "stream": true }'
Dashboard Controls
The interactive TUI dashboard provides a live view of the dispatcher's state:
j/kor Arrows: Navigate the selected list (Users, Backends, or Blocked Items).Taborh/l: Switch between the Backends, Users, and Blocked panels.SpaceorEnter: Expand/collapse the available models list for the selected backend (in the Backends panel).p: Toggle VIP status for the selected user (absolute priority).b: Toggle Boost status for the selected user (prioritizes every 2nd request).x: Block the selected user.X: Block the selected user's IP address.u: Unblock the selected user or IP (works in both panels).qor Esc: Exit the dashboard and stop the application.?: Toggle detailed help overlay.
Visual Indicators:
โถ/โผ: Indicates if a backend's model list is collapsed or expanded.โ(Magenta): VIP User (absolute priority).โก(Yellow): Boosted User (every 2nd request priority).โถ(Cyan): Request is currently being processed/streamed.โ(Green): Backend is Online or User has requests waiting in the queue.โ(Gray): User is idle or Backend is Offline.โ(Red): User or IP is blocked.
Logging
Logs are automatically written to ollamamq.log in the current working directory. This keeps the terminal clear for the TUI dashboard while allowing you to monitor system events and debug backend communication.
๐ณ Docker
Docker Compose
The included docker-compose.yml provides a ready-to-use configuration:
yamlservices: ollamamq: build: . image: chlebon/ollamamq:latest container_name: ollamamq ports: - "11435:11435" environment: - OLLAMA_URLS=http://host.docker.internal:11434 - PORT=11435 extra_hosts: - "host.docker.internal:host-gateway" restart: unless-stopped
Note for Linux Users:
When running in Docker on Linux to access a host-based Ollama:
- Listen on all interfaces: Ollama must be configured to listen on
0.0.0.0. You can do this by settingexport OLLAMA_HOST=0.0.0.0before starting the Ollama service (or editing the systemd unit file). - Firewall: Ensure your firewall (e.g.,
ufw) allows traffic from the Docker bridge (usually172.17.0.1/16) to port11434. - Host Gateway: The
extra_hostssetting indocker-compose.ymlmapshost.docker.internalto your host's IP address.
Dockerfile
The Dockerfile uses a multi-stage build:
- Build stage: Uses
rust:1.85-alpineto compile the release binary - Runtime stage: Uses
alpine:3.20with onlyca-certificatesfor a minimal footprint (~10MB)
Environment Variables
| Variable | Description | Default |
|---|---|---|
OLLAMA_URLS | URLs of the Ollama servers | http://localhost:11434 |
PORT | Port for ollamaMQ to listen on | 11435 |
TIMEOUT | Request timeout in seconds | 300 |
Connecting to Different Ollama Servers
Local Ollama (on host machine)
bashdocker run -d \ --name ollamamq \ -p 11435:11435 \ -e OLLAMA_URLS=http://host.docker.internal:11434 \ chlebon/ollamamq
Remote Ollama Server
bashdocker run -d \ --name ollamamq \ -p 11435:11435 \ -e OLLAMA_URLS=https://ollama.example.com:11434 \ chlebon/ollamamq
Custom Port on Same Server
bashdocker run -d \ --name ollamamq \ -p 8080:8080 \ -e OLLAMA_URLS=http://host.docker.internal:11436 \ -e PORT=8080 \ chlebon/ollamamq
Ollama in Docker (different container)
bashdocker run -d \ --name ollamamq \ --network ollama-network \ -p 11435:11435 \ -e OLLAMA_URLS=http://ollama:11434 \ chlebon/ollamamq
Port Configuration
- 11435: The proxy port that clients connect to (exposed by default)
- 11434: The Ollama server port (internal, not exposed)
To change the proxy port, use the PORT environment variable:
bashdocker run -d \ --name ollamamq \ -p 8080:8080 \ -e PORT=8080 \ chlebon/ollamamq
๐๏ธ Architecture
src/main.rs: Entry point, HTTP server initialization, and TUI lifecycle management.src/dispatcher.rs: Core logic for queuing, round-robin scheduling, and Ollama proxying.src/tui.rs: Implementation of the terminal-based monitoring dashboard.
Request Flow
- Client sends a request with
X-User-ID. ollamaMQpushes the request into a user-specific queue.- The background worker checks for available backends (Online & not busy).
- If a backend is free, the worker pops the next task (fair-share rotation) and spawns a parallel task.
- The request is proxied to the selected Ollama backend.
- The response is streamed back to the client in real-time, while the worker can immediately start another task on a different backend.
๐ฆ Publishing to Docker Hub
To publish a new version of ollamaMQ to Docker Hub, follow these steps:
-
Update Version: Update the version number in
Cargo.toml. -
Build and Tag:
bash# Build the image for the current version docker build -t chlebon/ollamamq:v0.2.4 . # Tag it as latest docker tag chlebon/ollamamq:v0.2.4 chlebon/ollamamq:latest -
Push to Hub:
bash# Log in to Docker Hub (if not already logged in) docker login # Push the versioned tag docker push chlebon/ollamamq:v0.2.4 # Push the latest tag docker push chlebon/ollamamq:latest
๐งช Development
Stress Testing
You can use the provided test_dispatcher.sh script to simulate multiple users and verify the dispatcher's behavior under load:
bash./test_dispatcher.sh

๐ License
This project is licensed under the MIT License - see the LICENSE file for details (if applicable).
Contributors
Showing top 2 contributors by commit count.
