Repositories tagged with "vision-language-learning"
Ovis
AIDC-AI
โA novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.โ
RLAIF-V
RLHF-V
โ[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthinessโ
OPERA
shikiw
โ[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocationโ
Modality-Integration-Rate
โ[ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".โ