Course Outline
Introduction
- How SRE integrates traditional IT with software development.
- The necessity of automation and observability
- The roles of software engineers versus system administrators.
- Site Reliability Engineers versus DevOps engineers.
IT System Overview
- System architecture, on-premise, and cloud environments.
Overview of SRE Principles and Practices
- Infrastructure as Code.
- The role of containerization and orchestration (Docker, Kubernetes, etc.)
- Continuous Integration, Continuous Deployment, and Continuous Delivery.
- Observability.
Evaluating an IT System
- Assessing team and organizational resources.
- Mapping out systems and processes.
- Estimating the potential impact of SRE.
- The role of the software engineering team.
- The role of the operations team.
- The role of management.
Maintaining System Reliability
- Describing and measuring desired service reliability.
- Understanding Service Level Objectives (SLOs)
- Understanding Service Level Indicators (SLIs) and Service Level Agreements (SLAs).
- Working with Error Budgets.
- Developing an SLO.
Optimizing System Administration
- Setting up a development environment
- Evaluating SRE tools
- Prioritizing tasks for automation.
- Writing software.
Deploying "Infrastructure as Code"
- Testing and iterating code
- Making a system anti-fragile
- Learning from failure
Monitoring a System
- Observing system performance.
- SRE tools and techniques.
The Future of SRE
Summary and Conclusion
Requirements
- A general understanding of IT infrastructure.
- A general understanding of the software development process.
- Programming or scripting experience in any language.
Audience
- Developers
- System administrators
- Software Architects
- DevOps engineers
- IT Managers
Testimonials (7)
How detailed subjects are explained with real world examples
Brian Hlabane - African Bank
Course - Site Reliability Engineering (SRE) Fundamentals
She is expert in area and provide really nice training. Material, training was really mix of examples , discussion and
Peter Tutka - Deutsche Telekom IT & Telecommunications Slovakia s.r.o.
Course - Site Reliability Engineering (SRE) Fundamentals
View on the SRE/ DevOps from more business/ theoretical point of view. Most helpful for people who already have the practical view.
Michael Varhol - Deutsche Telekom IT & Telecommunications Slovakia s.r.o.
Course - Site Reliability Engineering (SRE) Fundamentals
Approach of the training to send questionnaire before the training, so the training was planned accordingly to expectations. Brings the participants more active.
Stefan Girman - Deutsche Telekom IT & Telecommunications Slovakia s.r.o.
Course - Site Reliability Engineering (SRE) Fundamentals
Sticking to the initial survey from attendees about what should be the focus of training.
Denis Majorsky - Deutsche Telekom IT & Telecommunications Slovakia s.r.o.
Course - Site Reliability Engineering (SRE) Fundamentals
discussions , SRE definition
Daniel Horvath - Deutsche Telekom IT & Telecommunications Slovakia s.r.o.
Course - Site Reliability Engineering (SRE) Fundamentals
Concept of the training, keeping the people focused by asking them a questions and triggering discussions. Also group breakout sessions were great to think about things in groups and see different outcomes from other group.