Vlm Collection
Repositories tagged with "vlm"
Repositories tagged with "vlm"
transformers
huggingface
โ๐ค Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training. โ
UI-TARS-desktop
bytedance
โThe Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infraโ
sglang
sgl-project
โSGLang is a high-performance serving framework for large language models and multimodal models.โ
runanywhere-sdks
RunanywhereAI
โProduction ready toolkit to run AI locallyโ
notebooks
roboflow
โA collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like RF-DETR, YOLO11, SAM 3, and Qwen3-VL.โ
anomaly-detection-resources
yzhao062
โAnomaly detection related books, papers, videos, and toolboxes. Last update late 2025 for LLM and VLM works!โ
nexa-sdk
qualcomm
โRun frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Supporting OpenAI GPT-OSS, IBM Granite-4, Qwen-3-VL, Gemma-3n, Ministral-3, and more.โ
ERNIE
PaddlePaddle
โThe official repository for ERNIE 4.5 and ERNIEKit โ its industrial-grade development toolkit based on PaddlePaddle.โ
VLM-R1
om-ai-lab
โSolve Visual Understanding with Reinforced VLMsโ
UltraRAG
OpenBMB
โA Low-Code MCP Framework for Building Complex and Innovative RAG Pipelinesโ
LLM-RL-Visualized
changyeyu
โ๐100+ ๅๅ LLM / RL ๅ็ๅพ๐๏ผใๅคงๆจกๅ็ฎๆณใไฝ่ ๅทจ็ฎ๏ผ๐ฅ๏ผ100+ LLM/RL Algorithm Maps ๏ผโ
star-vector
joanrod
โStarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textual inputs to produce high-quality SVG code with remarkable precision.โ
lmms-eval
EvolvingLMMs-Lab
โOne-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasksโ
PromptEnhancer
Hunyuan-PromptEnhancer
โ[CVPR 2026] PromptEnhancer is a prompt-rewriting tool, refining prompts into clearer, structured versions for better image generation.โ
MiniMax-01
MiniMax-AI
โThe official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attentionโ
Local-File-Organizer
QiuYannnn
โAn AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.โ
Skywork-R1V
SkyworkAI
โSkywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.โ
OSWorld
xlang-ai
โ[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environmentsโ