Modern approaches to data management - course RUB 27,900. from IBS Training Center, training 16 hours, Date November 26, 2023.
Miscellaneous / / December 02, 2023
When designing applications, one of the important decisions is how to store data. For several decades, relational DBMSs were the first and only option; projects differed only in the degree of normalization, location of business logic, etc. The last ones ten to fifteen years, alternative systems have flourished rapidly - from object-oriented and document-oriented DBMSs to distributed file systems and stream processing systems data. The course examines a range of modern solutions that allow long-term secure storage of data, reasons for the emergence of solutions of different classes, their advantages, disadvantages and preferred methods use.
Topics covered:
1. Evolution of approaches to data storage (theory – 2 hours).
Databases, data warehouses, database engines, massively parallel architectures, hyperconvergence.
2. Relational model (theory – 2 hours).
What problems does it solve, and at what cost?
Replication, sharding, distributed transactions.
3. Minimum Key-Value model (theory – 1 hour, practice – 1 hour).
Key structure options, value structure options, software interfaces.
Efficiency of using non-relational databases: necessary and sufficient conditions [Cassandra, HBase].
4. Document-oriented model [MongoDB] (theory – 0.5 hours, practice – 0.5 hours).
5. Distributed file systems instead of data models: cluster architecture [HDFS] (theory - 1 hour, practice - 1 hour).
6. SQL over distributed file systems (theory – 1 hour, practice – 2 hours).
Architecture options, file formats, restrictions, transactions [Hive, Spark, Spark SQL, Parquet, ORC].
7. Distributed data storage systems in RAM [Hazelcast, Ignite, Tarantool] (theory – 1 hour).
8. Distributed OLAP systems [Clickhouse, Druid] (theory – 1 hour).
9. Processing data streams [Spark Streaming] (theory – 1 hour).
10. Self-configuring and autonomous databases (theory – 1 hour).