SRE Quiz: Can You Maintain 99.9% Uptime?

SRE Quiz: Can You Maintain 99.9% Uptime?

Questions 1–10: Fundamentals

1. What does SRE primarily focus on?
A. Writing frontend code
B. System reliability and automation
C. Designing UI/UX
D. Manual testing
Answer: B

2. What does 99.9% uptime allow as downtime per month?
A. ~43 minutes
B. ~7 hours
C. ~1 day
D. ~5 minutes
Answer: A

3. Which concept defines acceptable system failure?
A. SLA
B. SLO
C. Error Budget
D. Latency
Answer: C

4. What is an SLI?
A. Service Level Indicator
B. System Load Index
C. Server Latency Input
D. Service Log Integration
Answer: A

5. Which is an example of an SLI?
A. CPU usage
B. Request latency
C. Code coverage
D. Git commits
Answer: B

6. What does SLA stand for?
A. Service Level Agreement
B. System Load Automation
C. Server Logic Access
D. Software Lifecycle Analysis
Answer: A

7. Which tool is commonly used for monitoring?
A. Docker
B. Kubernetes
C. Prometheus
D. Git
Answer: C

8. What is toil in SRE?
A. Automated work
B. Manual repetitive work
C. Code deployment
D. Testing framework
Answer: B

9. What is the goal of reducing toil?
A. Increase manual work
B. Improve reliability through automation
C. Reduce code quality
D. Avoid monitoring
Answer: B

10. What is latency?
A. Data size
B. Time delay in response
C. CPU speed
D. Disk usage
Answer: B

Questions 11–20: Scenario-Based

11. Your service latency spikes suddenly. What should you check first?
A. UI design
B. Monitoring dashboards
C. Marketing strategy
D. Documentation
Answer: B

12. Error rates increase beyond SLO. What happens?
A. Ignore it
B. Spend error budget
C. Increase features
D. Reduce logging
Answer: B

13. A system crashes at midnight. What is your first step?
A. Sleep
B. Check alerts and logs
C. Restart randomly
D. Inform HR
Answer: B

14. Which practice helps prevent outages?
A. Ignoring logs
B. Chaos engineering
C. Manual deployments
D. No testing
Answer: B

15. What is a postmortem?
A. Code review
B. Incident analysis report
C. Deployment strategy
D. Monitoring tool
Answer: B

16. What is the purpose of alerts?
A. Notify issues
B. Store logs
C. Deploy code
D. Increase latency
Answer: A

17. Too many alerts cause what issue?
A. Alert fatigue
B. Faster systems
C. Better uptime
D. Reduced latency
Answer: A

18. What should a good alert include?
A. Random data
B. Actionable information
C. Marketing metrics
D. UI changes
Answer: B

19. What is autoscaling?
A. Manual scaling
B. Automatic resource adjustment
C. Code optimization
D. Testing strategy
Answer:

20. Which improves system resilience?
A. Single point of failure
B. Redundancy
C. No backups
D. Manual configs
Answer: B

Questions 21–30: Advanced

21. What is MTTR?
A. Mean Time To Recover
B. Maximum Time To Run
C. Minimum Test Time Required
D. Mean Test To Release
Answer: A

22. What is MTBF?
A. Mean Time Between Failures
B. Maximum Time Before Failure
C. Minimum Time Build Fail
D. Mean Test Before Fix
Answer: A

23. Which tool is used for container orchestration?
A. Jenkins
B. Kubernetes
C. Git
D. Ansible
Answer: B

24. What is blue-green deployment?
A. Color coding
B. Two identical environments for safe release
C. UI testing
D. Logging method
Answer: B

25. What is canary deployment?
A. Full rollout
B. Gradual release to subset of users
C. No deployment
D. Manual update
Answer: B

26. What is observability?
A. Monitoring only
B. Ability to understand system state via metrics, logs, traces
C. Logging only
D. Debugging UI
Answer: B

27. Which is NOT part of observability?
A. Metrics
B. Logs
C. Traces
D. UI colors
Answer: D

28. What is a runbook?
A. Code library
B. Step-by-step incident handling guide
C. Deployment script
D. Monitoring tool
Answer: B

29. What is a major cause of outages?
A. Automation
B. Human error
C. Monitoring
D. Alerts
Answer: B

30. What defines a reliable system?
A. No failures ever
B. Quick recovery and minimal impact
C. High cost
D. Complex design
Answer: B

Conclusion

Maintaining 99.9% uptime isn’t about avoiding failure altogether—it’s about anticipating, managing, and learning from it effectively. This quiz highlights that a strong Site Reliability Engineer (SRE) mindset goes beyond tools and theory. It requires balancing reliability with innovation, using concepts like SLIs, SLOs, and error budgets to make informed decisions.

If you scored high, you likely have a solid grasp of monitoring, incident response, and system design. If not, that’s perfectly fine SRE is a continuous learning journey. Focus on improving your understanding of observability, automation, and resilience strategies, and most importantly, learn from real-world scenarios.

In the end, great SREs don’t just keep systems running they build systems that recover gracefully, scale efficiently, and evolve safely.

shamitha
shamitha
Leave Comment
Share This Blog
Recent Posts
Get The Latest Updates

Subscribe To Our Newsletter

No spam, notifications only about our New Course updates.

Enroll Now
Enroll Now
Enquire Now