We know exactly what to learn to become a cool DWH developer
Step by Step
Querying data using SQL is an essential skill for anyone who works with data
SQL
As a data engineer you'll be writing a lot of code to handle various business cases such as ETLs, data pipelines, etc. The de facto standard language for data engineering is Python.
Programming language
RDBMS are the basic building blocks for any application data. A data engineer should know how to design and architect their structures, and learn about concepts that are related to them.
Relational Databases - Design & Architecture
noSQL is a term for any non-relational database model: key-value, document, column, graph, and more. A basic acquaintance is required, but going deeper into any model depends on the job.
noSql
Column databases are a kind of noSql databases. They deserve their own section as they are essential for the data engineer as working with Big Data online usually requires a columnar back-end.
Columnar Databases
Understand the concepts behind data warehouses and familiarize yourself with common data warehouse solutions
Data Warehouses
OLAP databases data modeling concepts, modeling the data correctly is essential for a functioning data warehouse.
OLAP Data Modeling
The first generation of data processing, using Hadoop and Spring. Everyone should know how it works, but going deep into the details and operations are recommended only if necessary. Focus more on streaming with tools like Spark today.
Batch Data Processing & MapReduce
The next generation of data processing. Suggested to get a good grasp of the subject from the Streaming Systems book and then dive deep into a specific tool like Kafka, Spark, Flink, etc.
Stream Data Processing
Scheduling tools for data processing. Airflow is considered to be the defacto standard, but any understanding of DAGs - directed acyclical graphs for tasks will be good.
Pipeline and Workflow Management
How to manage sensitive data, compliance with regulation and more.