GitPedia

Vlms zero to hero

This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.

From SkalskiP·Updated June 18, 2026·View on GitHub·

Welcome to VLMs Zero to Hero! This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models. The project is written primarily in Jupyter Notebook, distributed under the Apache License 2.0 license, first published in 2024. It has gained significant community traction with 1,180 stars and 102 forks on GitHub. Key topics include: bert-model, clip, computer-vision, embeddings, gpt.

<div align="center"> <h1 align="center">VLMs zero-to-hero</h1> <p>coming: january 2025...</p> </div>

hello

Welcome to VLMs Zero to Hero! This series will take you on a journey from the
fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.

tutorials

notebookopen in colabvideopaper
01.01. Word2Veq: Distributed Representations of Words and Phrases and their Compositionalitylinksoonlink

roadmap

natural language processing (NLP) fundamentals

computer vision (CV) fundamentals

early vision-language models

scale and efficiency

modern vision-language models

extra

contribute and suggest more papers

Are there important papers, models, or techniques we missed? Do you have a favorite
breakthrough in vision-language research that isn't listed here? We’d love to hear
your suggestions!

Contributors

Showing top 1 contributor by commit count.

View all contributors on GitHub →

This article is auto-generated from SkalskiP/vlms-zero-to-hero via the GitHub API.Last fetched: 6/19/2026