Multi Modal Collection
Repositories tagged with "multi-modal"
Repositories tagged with "multi-modal"
agentscope
agentscope-ai
โBuild and run agents you can see, understand and trust.โ
MiniCPM-V
OpenBMB
โA Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phoneโ
ten-framework
TEN-framework
โ Open-source framework for conversational voice AI agentsโ
InternVL
OpenGVLab
โ[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. ๆฅ่ฟGPT-4o่กจ็ฐ็ๅผๆบๅคๆจกๆๅฏน่ฏๆจกๅโ
modelscope
modelscope
โModelScope: bring the notion of Model-as-a-Service to life.โ
big-AGI
enricoros
โAI suite powered by state-of-the-art models and providing advanced AI/AGI functions. Includes AI personas, AGI functions, world-class Beam multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.โ
CogVLM
zai-org
โa state-of-the-art-level open visual language model | ๅคๆจกๆ้ข่ฎญ็ปๆจกๅโ
data-juicer
datajuicer
โData processing for and with foundation models! ๐ ๐ ๐ฝ โก๏ธ โก๏ธ๐ธ ๐น ๐ทโ
Chinese-CLIP
OFA-Sys
โChinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.โ
valhalla
valhalla
โOpen Source Routing Engine for OpenStreetMapโ
DALLE-pytorch
lucidrains
โImplementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorchโ
marqo
marqo-ai
โEcommerce Search and Discovery - marqo.aiโ
DeepKE
zjunlp
โ[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Constructionโ
OmniGen
VectorSpaceLab
โOmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340โ
VLMEvalKit
open-compass
โOpen-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarksโ
VisualGLM-6B
zai-org
โChinese and English multimodal conversational language model | ๅคๆจกๆไธญ่ฑๅ่ฏญๅฏน่ฏ่ฏญ่จๆจกๅโ
LLamaSharp
SciSharp
โA C#/.NET library to run LLM (๐ฆLLaMA/LLaVA) on your local device efficiently.โ
Video-LLaVA
PKU-YuanGroup
โใEMNLP 2024๐ฅใVideo-LLaVA: Learning United Visual Representation by Alignment Before Projectionโ