GitPedia

Data Engineering HowTo

A list of useful resources to learn Data Engineering from scratch

From adilkhash·Updated June 25, 2026·View on GitHub·

- [The AI Hierarchy of Needs](https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007) - [The Rise of Data Engineer](https://medium.freecodecamp.org/the-rise-of-the-data-engineer-91be18f1e603) - [The Downfall of the Data Engineer](https://medium.com/@maximebeauchemin/the-downfall-of-the-data-engineer-5bfb701e5d6b) - A Beginner’s Guide to Data Engineering - [Part I](https://medium.com/@rchang/a-beginners-guide-to-data-engineering-part-i-4227c5c457d7) - [Part II](https://medium.com/@rchang/a... The project is first published in 2019. It has gained significant community traction with 3,995 stars and 568 forks on GitHub. Key topics include: cloud-providers, data-engineering, data-pipeline, distributed-systems, scala.

How To Become a Data Engineer

Useful articles

Talks

Algorithms & Data Structures

SQL

Programming

Databases

Distributed Systems

Books

Courses

Blogs

  • Martin Kleppmann author of Designing Data-Intensive Application
  • BaseDS by Vaidehi Joshi about Distributed Systems

Tools

  • Apache Airflow is a platform to programmatically author, schedule and monitor workflows in Python
  • Apache Spark is a unified analytics engine for large-scale data processing
  • Apache Kafka is a distributed streaming platform
  • Luigi is a Python package that helps you build complex pipelines of batch jobs.
  • Dagster.io is a system for building modern data applications.
  • Prefect includes everything you need to create and run data applications.
  • Metaflow build and manage real-life data science projects with ease
  • lakeFS build repeatable, atomic and versioned data lake operations – from complex ETL jobs to data science and analytics.

Cloud Platforms

Communities

Data Engineering Jobs

Other

Newsletters & Digests

Contributors

Showing top 12 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from adilkhash/Data-Engineering-HowTo via the GitHub API.Last fetched: 6/27/2026