Course "Data Engineer" - course 95,000 rub. from Yandex Workshop, training 6.5 months, Date: December 11, 2023.
Miscellaneous / / November 30, 2023
For practicing developers
Learn to build an infrastructure for working with data and systematize your knowledge to use it in your current role or change direction to a data engineer.
For aspiring data engineers
Structuring knowledge: in addition to clear theory, there will be a lot of practice. You will gain experience working on projects - this will help you build a portfolio, stand out from other candidates and not get lost in real work.
Data Science Specialists and Analysts
Master skills that will help you cope with tasks more effectively: build data pipelines, design storefronts, build ETL and collect raw data in large volumes.
Updating the data model
1 module 2 weeks
The company continues to immerse you in its processes. The data you were working with has been updated, so you need to change the data model.
In this course you:
- understand how the company builds a database;
- update the structure of the current database in accordance with new business requirements;
- prepare new showcases and metrics for analysts and managers.
Technologies and tools:
- PostgreSQL
+1 project in portfolio
Build a data mart with incremental loading for online store audience analytics.
DWH: data model revision
Module 2 3 weeks
The company is growing, the data architecture is becoming more complex. You are given a task - to optimize processes with data.
In this course you:
- think through the process of transitioning from the old database scheme to the new one while minimizing business losses (zero-downtime deployment);
- prepare data migration;
- take into account possible problems and design an option to roll back changes;
- implement a new database structure and adapt it to existing processes around data.
Technologies and tools:
- PosgreSQL
- Python
+1 project in portfolio
You will put the data model in order and migrate data within the current storage of the online store.
ETL: data preparation automation
Module 3 3 weeks
You now know almost everything about the company's data warehouse. It's time to rethink ETL processes.
In this course you:
- automate the data pipeline;
- configure automatic downloading of data from sources;
- learn to regularly and incrementally load data into the database.
Technologies and tools:
- Python
- Airflow
- PostgreSQL
+1 project in portfolio
Build a pipeline for automated receipt, processing and loading of data from sources to the storefront for an e-commerce project.
Data quality check
Module 4 1 week
You want to be sure that your first pipelines are working fine. Data quality must be checked, and breakdowns must be tracked in a timely manner.
In this course you:
- understand how to use metainformation and documentation;
- evaluate the quality of the data.
DWH for multiple sources
Module 5 2 weeks
You continue to research DWH because the company's development and, therefore, the increase in data volume cannot be stopped.
In this course you:
- build DWH from scratch on a relational DBMS;
- get acquainted with MongoDB as a data source.
Technologies and tools:
- PostgreSQL
- MongoDB
+1 project in portfolio
You will design and implement DWH for an in-house startup.
Analytical databases
Module 6 2 weeks
There is more and more specific unstructured data that also needs to be stored and processed. Therefore, we will introduce you to the concept of analytical databases using the Vertica DBMS as an example.
In this course you:
- study storage organization in Vertica;
- learn how to do basic operations with data in Vertica;
- build a simple data warehouse in Vertica.
Technologies and tools:
- Vertica
- PostgreSQL
- Airflow
- S3
+1 project in portfolio
Build a DWH for a high-load low-structured messenger data system using Vertica.
Data Lake Organization
Module 7 4 weeks
Classic solutions do not help cope with the volume of data. To cope with new business challenges, you will build and populate a Data Lake.
In this course you:
- consider the Data Lake architecture (trans. "data lake");
- learn to process data in the MPP system;
- fill the Data Lake with data from sources;
- practice data processing using PySpark and Airflow.
Technologies and tools:
- Hadoop
- MapReduce
- HDFS
- Apache Spark (PySpark)
+1 project in portfolio
Build a Data Lake and automate the loading and processing of data in it.
Stream processing
Module 8 3 weeks
You have overcome the difficulties with a large amount of data, but a new task has appeared - you need to help the business make decisions faster. Here you will need knowledge of stream data processing. streaming).
In this course you:
- consider the features of stream data processing;
- build your own streaming system;
- build a storefront using real-time data.
Technologies and tools:
- Kafka
- Spark Streaming
+1 project in portfolio
You will develop a real-time data processing system.
Cloud technologies
Module 9 3 weeks
Now you can work with both large volumes of data and streams. All that remains is to automate the scaling of systems using cloud services.
In this course you will learn how to implement already studied solutions, but in the cloud (using Yandex Cloud as an example).
Technologies and tools:
- Yandex. Cloud
- Kubernetes
- kubectl
- Redis
- PostgreSQL
+1 project in portfolio
You will develop infrastructure for storing and processing data in the cloud.
Graduation project
Module 10 3 weeks
Confirm that you have learned new skills.
Here you will need to independently select and implement solutions to a business problem. This will help you once again reinforce the use of the tools you have learned, as well as your independence.