GitPedia

Nos

⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.

From autonomi-ai·Updated January 19, 2026·View on GitHub·

**nos** is a ⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW. The project is written primarily in Python, distributed under the Apache License 2.0 license, first published in 2023. Key topics include: computer-vision, generative-ai, inference, inference-acceleration, llm-inference.

Latest release: 0.3.0v0.3.0
<center><img src="./docs/assets/nos-header.svg" alt="Nitro Boost for your AI Infrastructure"></center> <p></p> <p align="center"> <a href="https://docs.nos.run/"><b>Website</b></a> | <a href="https://docs.nos.run/"><b>Docs</b></a> | <a href="https://github.com/autonomi-ai/nos/tree/main/examples/tutorials"><b>Tutorials</b></a> | <a href="https://github.com/autonomi-ai/nos-playground"><b>Playground</b></a> | <a href="https://docs.nos.run/docs/blog"><b>Blog</b></a> | <a href="https://discord.gg/QAGgvTuvgg"><b>Discord</b></a> </p> <p align="center"> <a href="https://pypi.org/project/torch-nos/"><img alt="PyPI Version" src="https://badge.fury.io/py/torch-nos.svg"></a> <a href="https://pypi.org/project/torch-nos/"><img alt="PyPI Version" src="https://img.shields.io/pypi/pyversions/torch-nos"></a> <a href="https://www.pepy.tech/projects/torch-nos"><img alt="PyPI Downloads" src="https://img.shields.io/pypi/dm/torch-nos"></a> <a href="https://hub.docker.com/repository/docker/autonomi/nos/general"><img alt="Docker Pulls" src="https://img.shields.io/docker/pulls/autonomi/nos.svg"></a><br> <a href="https://github.com/autonomi-ai/nos/blob/main/LICENSE"><img alt="PyPi Downloads" src="https://img.shields.io/github/license/autonomi-ai/nos.svg"></a> <a href="https://discord.gg/QAGgvTuvgg"><img alt="Discord" src="https://img.shields.io/badge/discord-chat-purple?color=%235765F2&label=discord&logo=discord"></a> <a href="https://twitter.com/autonomi_ai"><img alt="PyPi Version" src="https://img.shields.io/twitter/follow/autonomi_ai.svg?style=social&logo=twitter"></a> </p>

NOS is a fast and flexible PyTorch inference server that runs on any cloud or AI HW.

🛠️ Key Features

  • 👩‍💻 Easy-to-use: Built for PyTorch and designed to optimize, serve and auto-scale Pytorch models in production without compromising on developer experience.
  • 🥷 Multi-modal & Multi-model: Serve multiple foundational AI models (LLMs, Diffusion, Embeddings, Speech-to-Text and Object Detection) simultaneously, in a single server.
  • ⚙️ HW-aware Runtime: Deploy PyTorch models effortlessly on modern AI accelerators (NVIDIA GPUs, AWS Inferentia2, AMD - coming soon, and even CPUs).
  • ☁️ Cloud-agnostic Containers: Run on any cloud (AWS, GCP, Azure, Lambda Labs, On-Prem) with our ready-to-use inference server containers.

🔥 What's New

🚀 Quickstart

We highly recommend that you go to our quickstart guide to get started. To install the NOS client, you can run the following command:

bash
conda create -n nos python=3.8 -y conda activate nos pip install torch-nos

Once the client is installed, you can start the NOS server via the NOS serve CLI. This will automatically detect your local environment, download the docker runtime image and spin up the NOS server:

bash
nos serve up --http --logging-level INFO

You are now ready to run your first inference request with NOS! You can run any of the following commands to try things out. You can set the logging level to DEBUG if you want more detailed information from the server.

👩‍💻 What can NOS do?

💬 Chat / LLM Agents (ChatGPT-as-a-Service)


NOS provides an OpenAI-compatible server with streaming support so that you can connect your favorite OpenAI-compatible LLM client to talk to NOS.

<img src="docs/assets/llama_nos.gif" width="400"> <br> <details> <summary> API / Usage</summary> <br>

<b>gRPC API ⚡</b>

python
from nos.client import Client client = Client() model = client.Module("TinyLlama/TinyLlama-1.1B-Chat-v1.0") response = model.chat(message="Tell me a story of 1000 words with emojis", _stream=True)

<b>REST API</b>

bash
curl \ -X POST http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "messages": [{ "role": "user", "content": "Tell me a story of 1000 words with emojis" }], "temperature": 0.7, "stream": true }'
</details>

🏞️ Image Generation (Stable-Diffusion-as-a-Service)


Build MidJourney discord bots in seconds.

<img src="docs/assets/hippo_with_glasses_sdxl.jpg" width="400"> <br> <details> <summary> API / Usage</summary> <br>

<b>gRPC API ⚡</b>

python
from nos.client import Client client = Client() sdxl = client.Module("stabilityai/stable-diffusion-xl-base-1-0") image, = sdxl(prompts=["hippo with glasses in a library, cartoon styling"], width=1024, height=1024, num_images=1)

<b>REST API</b>

bash
curl \ -X POST http://localhost:8000/v1/infer \ -H 'Content-Type: application/json' \ -d '{ "model_id": "stabilityai/stable-diffusion-xl-base-1-0", "inputs": { "prompts": ["hippo with glasses in a library, cartoon styling"], "width": 1024, "height": 1024, "num_images": 1 } }'
</details>

🧠 Text & Image Embedding (CLIP-as-a-Service)


Build scalable semantic search of images/videos in minutes.

<img src="docs/assets/embedding.png" width="400"> <br> <details> <summary> API / Usage</summary> <br>

<b>gRPC API ⚡</b>

python
from nos.client import Client client = Client() clip = client.Module("openai/clip-vit-base-patch32") txt_vec = clip.encode_text(texts=["fox jumped over the moon"])

<b>REST API</b>

bash
curl \ -X POST http://localhost:8000/v1/infer \ -H 'Content-Type: application/json' \ -d '{ "model_id": "openai/clip-vit-base-patch32", "method": "encode_text", "inputs": { "texts": ["fox jumped over the moon"] } }'
</details>

🎙️ Audio Transcription (Whisper-as-a-Service)


Perform real-time audio transcription using Whisper.

<img src="docs/assets/transcription.png" width="400"> <br> <details> <summary> API / Usage</summary> <br>

<b>gRPC API ⚡</b>

python
from pathlib import Path from nos.client import Client client = Client() model = client.Module("openai/whisper-small.en") with client.UploadFile(Path("audio.wav")) as remote_path: response = model(path=remote_path) # {"chunks": ...}

<b>REST API</b>

bash
curl \ -X POST http://localhost:8000/v1/infer/file \ -H 'accept: application/json' \ -H 'Content-Type: multipart/form-data' \ -F 'model_id=openai/whisper-small.en' \ -F 'file=@audio.wav'
</details>

🧐 Object Detection (YOLOX-as-a-Service)


Run classical computer-vision tasks in 2 lines of code.

<img src="docs/assets/bench_park_detections.png" width="400"> <br> <details> <summary> API / Usage</summary> <br>

<b>gRPC API ⚡</b>

python
from pathlib import Path from nos.client import Client client = Client() model = client.Module("yolox/medium") response = model(images=[Image.open("image.jpg")])

<b>REST API</b>

bash
curl \ -X POST http://localhost:8000/v1/infer/file \ -H 'accept: application/json' \ -H 'Content-Type: multipart/form-data' \ -F 'model_id=yolox/medium' \ -F 'file=@image.jpg'
</details>

⚒️ Custom models


Want to run models not supported by NOS? You can easily add your own models following the examples in the NOS Playground.

📄 License

This project is licensed under the Apache-2.0 License.

📡 Telemetry

NOS collects anonymous usage data using Sentry. This is used to help us understand how the community is using NOS and to help us prioritize features. You can opt-out of telemetry by setting NOS_TELEMETRY_ENABLED=0.

🤝 Contributing

We welcome contributions! Please see our contributing guide for more information.

<br> <style> .md-typeset h1, .md-content__button { display: none; } </style>

Contributors

Showing top 3 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from autonomi-ai/nos via the GitHub API.Last fetched: 6/25/2026