Apache Spark framework for developers: advanced level - course 41,500 rub. from IBS Training Center, training 24 hours, Date November 26, 2023.
Miscellaneous / / December 05, 2023
The training provides a detailed understanding of the internal structure and functioning of the Apache Spark framework - both Spark Core (RDD), Spark SQL, Spark Streaming and Spark Structured Streaming. The mechanisms for launching Spark cluster components under the control of different cluster managers, managing the allocation of resources (primarily memory), and the mechanisms of work of schedulers are considered. The advantages of the Tungsten internal representation format and the operation of the Catalyst optimizer are explored in detail.
Topics covered:
Spark Internal Architecture, Spark Runtime Environment
Setting up Spark Context, SparkConf
RDD Internals, Logical Layout
Best Practices for Programming with RDD
Physical plan: work, stages, tasks
Planners and Physical Plan Execution
Memory tuning, serialization, caching, garbage collection
Datasource API, Tungsten internal data representation, file formats
Catalyst Optimizer
Microbatch Spark Streaming: receiving and outputting data
Structured Streaming: receiving and distributing data