Table of Contents
ToggleQuestions 1–10: Fundamentals
1. What does SRE primarily focus on?
A. Writing frontend code
B. System reliability and automation
C. Designing UI/UX
D. Manual testing
✅ Answer: B
2. What does 99.9% uptime allow as downtime per month?
A. ~43 minutes
B. ~7 hours
C. ~1 day
D. ~5 minutes
✅ Answer: A
3. Which concept defines acceptable system failure?
A. SLA
B. SLO
C. Error Budget
D. Latency
✅ Answer: C
4. What is an SLI?
A. Service Level Indicator
B. System Load Index
C. Server Latency Input
D. Service Log Integration
✅ Answer: A
5. Which is an example of an SLI?
A. CPU usage
B. Request latency
C. Code coverage
D. Git commits
✅ Answer: B
6. What does SLA stand for?
A. Service Level Agreement
B. System Load Automation
C. Server Logic Access
D. Software Lifecycle Analysis
✅ Answer: A
7. Which tool is commonly used for monitoring?
A. Docker
B. Kubernetes
C. Prometheus
D. Git
✅ Answer: C
8. What is toil in SRE?
A. Automated work
B. Manual repetitive work
C. Code deployment
D. Testing framework
✅ Answer: B
9. What is the goal of reducing toil?
A. Increase manual work
B. Improve reliability through automation
C. Reduce code quality
D. Avoid monitoring
✅ Answer: B
10. What is latency?
A. Data size
B. Time delay in response
C. CPU speed
D. Disk usage
✅ Answer: B
Questions 11–20: Scenario-Based
11. Your service latency spikes suddenly. What should you check first?
A. UI design
B. Monitoring dashboards
C. Marketing strategy
D. Documentation
✅ Answer: B
12. Error rates increase beyond SLO. What happens?
A. Ignore it
B. Spend error budget
C. Increase features
D. Reduce logging
✅ Answer: B
13. A system crashes at midnight. What is your first step?
A. Sleep
B. Check alerts and logs
C. Restart randomly
D. Inform HR
✅ Answer: B
14. Which practice helps prevent outages?
A. Ignoring logs
B. Chaos engineering
C. Manual deployments
D. No testing
✅ Answer: B
15. What is a postmortem?
A. Code review
B. Incident analysis report
C. Deployment strategy
D. Monitoring tool
✅ Answer: B
16. What is the purpose of alerts?
A. Notify issues
B. Store logs
C. Deploy code
D. Increase latency
✅ Answer: A
17. Too many alerts cause what issue?
A. Alert fatigue
B. Faster systems
C. Better uptime
D. Reduced latency
✅ Answer: A
18. What should a good alert include?
A. Random data
B. Actionable information
C. Marketing metrics
D. UI changes
✅ Answer: B
19. What is autoscaling?
A. Manual scaling
B. Automatic resource adjustment
C. Code optimization
D. Testing strategy
✅ Answer:
20. Which improves system resilience?
A. Single point of failure
B. Redundancy
C. No backups
D. Manual configs
✅ Answer: B
Questions 21–30: Advanced
21. What is MTTR?
A. Mean Time To Recover
B. Maximum Time To Run
C. Minimum Test Time Required
D. Mean Test To Release
✅ Answer: A
22. What is MTBF?
A. Mean Time Between Failures
B. Maximum Time Before Failure
C. Minimum Time Build Fail
D. Mean Test Before Fix
✅ Answer: A
23. Which tool is used for container orchestration?
A. Jenkins
B. Kubernetes
C. Git
D. Ansible
✅ Answer: B
24. What is blue-green deployment?
A. Color coding
B. Two identical environments for safe release
C. UI testing
D. Logging method
✅ Answer: B
25. What is canary deployment?
A. Full rollout
B. Gradual release to subset of users
C. No deployment
D. Manual update
✅ Answer: B
26. What is observability?
A. Monitoring only
B. Ability to understand system state via metrics, logs, traces
C. Logging only
D. Debugging UI
✅ Answer: B
27. Which is NOT part of observability?
A. Metrics
B. Logs
C. Traces
D. UI colors
✅ Answer: D
28. What is a runbook?
A. Code library
B. Step-by-step incident handling guide
C. Deployment script
D. Monitoring tool
✅ Answer: B
29. What is a major cause of outages?
A. Automation
B. Human error
C. Monitoring
D. Alerts
✅ Answer: B
30. What defines a reliable system?
A. No failures ever
B. Quick recovery and minimal impact
C. High cost
D. Complex design
✅ Answer: B
Conclusion
Maintaining 99.9% uptime isn’t about avoiding failure altogether—it’s about anticipating, managing, and learning from it effectively. This quiz highlights that a strong Site Reliability Engineer (SRE) mindset goes beyond tools and theory. It requires balancing reliability with innovation, using concepts like SLIs, SLOs, and error budgets to make informed decisions.
If you scored high, you likely have a solid grasp of monitoring, incident response, and system design. If not, that’s perfectly fine SRE is a continuous learning journey. Focus on improving your understanding of observability, automation, and resilience strategies, and most importantly, learn from real-world scenarios.
In the end, great SREs don’t just keep systems running they build systems that recover gracefully, scale efficiently, and evolve safely.



