Image Understanding Collection
Repositories tagged with "image-understanding"
Repositories tagged with "image-understanding"
vllm-mlx
waybarrios
โOpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.โ
Lance
bytedance
โA 3B-active-parameter native unified multimodal model for image and video understanding, generation, and editing.โ
Chat-UniVi
PKU-YuanGroup
โ[CVPR 2024 Highlight๐ฅ] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understandingโ
UniWorld
PKU-YuanGroup
โUniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generationโ
openai-chat-api-workflow
yohasebe
โ๐ฉ An Alfred 5 Workflow for using OpenAI Chat API to interact with GPT models ๐ค๐ฌ It also allows image generation/editing/understanding ๐ผ๏ธ, speech-to-text conversion ๐ค, and text-to-speech synthesis ๐โ
relationformer
suprosanna
โA Unified Framework for Image-to-Graph Generation. Paper accepted @ ECCV22.โ
Mobile-O
Amshaker
โ[CVPR'26 Demo] Mobile-O: Unified Multimodal Understanding and Generation on Mobile Deviceโ
Ming-UniVision
inclusionAI
โCode release for Ming-UniVision: Joint Image Understanding and Geneation with a Continuous Unified Tokenizerโ
UniPercept
thunderbolt215
โ[ICML26 Spotlight] UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Textureโ
WACV-2024-Papers
DmitryRyumin
โWACV 2024 Papers: Discover cutting-edge research from WACV 2024, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. โญ support visual intelligence development!โ
UniMedVL
uni-medical
โOfficial implementation of "UniMedVL: Unifying Medical Multimodal Understanding and Generation through Observation-Knowledge-Analysis" - A unified medical vision-language model that integrates multimodal understanding and generation capabilities.โ
DynamicVis
KyanChen
โThis is the implement of the paper "DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding"โ
relsim
thaoshibe
โ๐ relsim: Relational Visual Similarity | pip install relsim ๐ (CVPR 2026)โ
Awesome-Multimodal-Reasoning
The-Martyr
โLatest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal LLMsโ
luma-mcp
JochenYang
โMulti-Model Visual Understanding MCP Server, GLM-4.6V, DeepSeek-OCR (free), and Qwen3-VL-Flash. Provide visual processing capabilities for AI coding models that do not support image understanding.ๅคๆจกๅ่ง่ง็่งฃMCPๆๅกๅจ๏ผGLM-4.6VใDeepSeek-OCR๏ผๅ ่ดน๏ผๅQwen3-VL-Flash็ญใไธบไธๆฏๆๅพ็็่งฃ็ AI ็ผ็ ๆจกๅๆไพ่ง่งๅค็่ฝๅใโ
