SRE practices and tools - free course from Otus, training 5 months, Date: December 1, 2023.
Miscellaneous / / December 04, 2023
SRE is an approach to organizing IT Operations. SRE teams use software as a tool to manage systems, solve problems, and automate operational tasks. SRE takes over tasks that were historically performed by operators and system administrators, often manually, and instead hands them over to operations teams who use software and automation to solve problems and manage systems.
SRE is the practice of building scalable and highly reliable software systems. It helps manage large systems with infrastructure code that is more scalable and resilient for system administrators managing thousands or hundreds of thousands computers.
Large companies like Google and Netflix have a rotating practice where developers, testers, or operations engineers may temporarily, for several months, change their position and work in other teams, we suggest you conduct a similar experiment.
The course is suitable for:
- Developers who develop further and are responsible for their services in production environments - SRE and system engineers are tasked which includes ensuring reliability and availability - Infrastructure and platform engineers who began to provide their services to other teams - Technical Directors, managers and team leads who want to understand and implement the best SRE practices and tools
In the course you will learn how to:
- Implement SRE practices in your organization - Manage reliability, availability and efficiency services - Manage changes - Monitor - Respond to incidents and performance
We will perform practical tasks based on the following technology stack: Linux, AWS, GCP, Kubernetes, Ansible, Terraform, Prometheus, Go, Python.
Upon completion of the course you will:
- Be familiar with SRE practices and tools - Be able to explain SRE principles to colleagues - Understand how to build SRE processes in context interaction with other departments of the company - You will be able to apply the acquired knowledge in your daily work, improving the life of yourself, colleagues, the project and companies
8
courses20+ years of experience in custom development projects in IT. Dozens of successful projects, including those under government contracts. Experience in the development and implementation of ERP systems, open-source solutions, support for high-load applications. Teacher of courses on...
20+ years of experience in custom development projects in IT. Dozens of successful projects, including those under government contracts. Experience in the development and implementation of ERP systems, open-source solutions, support for high-load applications. Teacher of courses on Linux, Kuber, MLOps, DataOps, SolutionArchitect, IaC, SRE, as well as mentor of the HighLoad course
2
courseHelping people understand what exactly computers do. Worked in fintech, telecom, game development, and in recent years in business and technology consulting. My strengths are planning, development, deployment and debugging of heterogeneous environments, interaction...
Helping people understand what exactly computers do. Worked in fintech, telecom, game development, and in recent years in business and technology consulting. My strengths are planning, development, deployment and debugging of heterogeneous environments, interaction with businesses and clients. Stack: Linux, Ansible, Terraform, data center infrastructure
Introduction to SRE
-Topic 1.Introduction to SRE
-Topic 2.Basic principles of SRE
SRE practices
-Topic 3.SLI, SLA, SLO and risk management
-Topic 4.Automation 1
-Topic 5.Automation 2
-Topic 6.Practice configuration management. Ansible
-Topic 7.Practice configuration management. Terraform
-Topic 8.Monitoring and notification practice
-Topic 9.QA session
-Topic 10. Continuous Delivery and Change Management
-Topic 11.Practice release management
-Topic 12.Practice configuration management. Helm
-Topic 13. Practice of testing system reliability
-Topic 14.Load management practice to prevent overloads and failures
-Topic 15. On-call practice and the life cycle of an SRE team
-Topic 16.Practice of postmortems
-Topic 17.Practice of diagnosis and problem solving
Project work
-Topic 18. Selection of topic and organization of project work
-Topic 19. Consultation on projects and homework - intermediate acceptance
-Topic 20.Protection of design work