HomeExploreimage-understanding

Image Understanding Collection

Repositories tagged with "image-understanding"

RARE

TCG-style cards with ATK/DEF/SPD stats

UNCOMMON

⭐1.3kHP

◆

🔮Psychic

★★

vllm-mlx

waybarrios

Pythonanthropicapple-silicon

“OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.”

★

1.3k

188

1.3k

188 forks

ATK

DEF

SPD

GitPedia #417

2/5

View wiki →𝕏

GitPedia

Repository Card

UNCOMMON

★

1.3k

188

1.3k

UNCOMMON

⭐1.2kHP

◆

🔮Psychic

★★

Lance

bytedance

Pythonimage-editingimage-generation

“A 3B-active-parameter native unified multimodal model for image and video understanding, generation, and editing.”

★

1.2k

79 forks

ATK

DEF

SPD

GitPedia #472

2/5

View wiki →𝕏

GitPedia

Repository Card

UNCOMMON

★

1.2k

UNCOMMON

⭐944HP

◆

🔮Psychic

★★

Chat-UniVi

PKU-YuanGroup

Pythonimage-understandinglarge-language-models

“[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding”

★

944

48 forks

ATK

DEF

SPD

GitPedia #096

2/5

View wiki →𝕏

GitPedia

Repository Card

UNCOMMON

★

944

UNCOMMON

⭐876HP

◆

🔮Psychic

★★

UniWorld

PKU-YuanGroup

Pythondiffusionhigh-level-feature

“UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation”

★

876

29 forks

ATK

DEF

SPD

GitPedia #369

2/5

View wiki →𝕏

GitPedia

Repository Card

UNCOMMON

★

876

COMMON

⭐316HP

◆

🌸Fairy

★

openai-chat-api-workflow

yohasebe

Rubyaialfred

“🎩 An Alfred 5 Workflow for using OpenAI Chat API to interact with GPT models 🤖💬 It also allows image generation/editing/understanding 🖼️, speech-to-text conversion 🎤, and text-to-speech synthesis 🔈”

★

316

11 forks

ATK

DEF

SPD

GitPedia #354

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

316

COMMON

⭐151HP

◆

📦Normal

★

relationformer

suprosanna

image-understandingroad-network

“A Unified Framework for Image-to-Graph Generation. Paper accepted @ ECCV22.”

★

151

22 forks

ATK

DEF

SPD

GitPedia #299

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

151

COMMON

⭐149HP

◆

🔮Psychic

★

Mobile-O

Amshaker

Pythonimage-editimage-generation-model

“[CVPR'26 Demo] Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device”

★

149

15 forks

ATK

DEF

SPD

GitPedia #129

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

149

COMMON

⭐143HP

◆

🔮Psychic

★

Ming-UniVision

inclusionAI

Pythonimageimage-editing

“Code release for Ming-UniVision: Joint Image Understanding and Geneation with a Continuous Unified Tokenizer”

★

143

5 forks

ATK

DEF

SPD

GitPedia #869

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

143

COMMON

⭐143HP

◆

🔮Psychic

★

UniPercept

thunderbolt215

Pythonbenchmarkdataset

“[ICML26 Spotlight] UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture”

★

143

1 forks

ATK

DEF

SPD

GitPedia #309

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

143

COMMON

⭐97HP

◆

🔮Psychic

★

WACV-2024-Papers

DmitryRyumin

Python3d-computer-vision3d-sensor

“WACV 2024 Papers: Discover cutting-edge research from WACV 2024, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!”

★

13 forks

ATK

DEF

SPD

GitPedia #076

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

COMMON

⭐94HP

◆

🔮Psychic

★

UniMedVL

uni-medical

Pythonai4healthcfp

“Official implementation of "UniMedVL: Unifying Medical Multimodal Understanding and Generation through Observation-Knowledge-Analysis" - A unified medical vision-language model that integrates multimodal understanding and generation capabilities.”

★

9 forks

ATK

DEF

SPD

GitPedia #583

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

COMMON

⭐86HP

◆

🔮Psychic

★

DynamicVis

KyanChen

Pythonchange-detectioncomputer-vision

“This is the implement of the paper "DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding"”

★

2 forks

ATK

DEF

SPD

GitPedia #913

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

COMMON

⭐84HP

◆

🔮Psychic

★

relsim

thaoshibe

Pythoncvprcvpr2026

“🍑 relsim: Relational Visual Similarity | pip install relsim 🌍 (CVPR 2026)”

★

1 forks

ATK

DEF

SPD

GitPedia #239

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

COMMON

⭐71HP

◆

📦Normal

★

Awesome-Multimodal-Reasoning

The-Martyr

chain-of-thoughtcot

“Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal LLMs”

★

5 forks

ATK

DEF

SPD

GitPedia #651

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

COMMON

⭐69HP

◆

💎Aqua

★

luma-mcp

JochenYang

TypeScriptimage-understandingmcp

“Multi-Model Visual Understanding MCP Server, GLM-4.6V, DeepSeek-OCR (free), and Qwen3-VL-Flash. Provide visual processing capabilities for AI coding models that do not support image understanding.多模型视觉理解MCP服务器，GLM-4.6V、DeepSeek-OCR（免费）和Qwen3-VL-Flash等。为不支持图片理解的 AI 编码模型提供视觉处理能力。”

★

8 forks

ATK

DEF

SPD

GitPedia #916

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★