Practical Prompt Examples for SRE Post-Incident Analysis

Effective post-incident analysis is crucial for Site Reliability Engineers (SREs) to improve system resilience and prevent future outages. Using practical prompts can guide teams through thorough investigations and foster continuous improvement. Here are some example prompts to enhance your post-incident reviews.

Prompt 1: Incident Timeline Reconstruction

Describe the sequence of events leading up to the incident. Include timestamps, system states, and user reports. What was the first sign of the issue, and how did it escalate?

Prompt 2: Root Cause Identification

What was the underlying cause of the incident? Consider both technical failures and process gaps. Was there a specific code change, configuration error, or external factor?

Prompt 3: Impact Assessment

Assess the scope and severity of the impact. Which users, services, or regions were affected? How long did the outage last, and what was the business impact?

Prompt 4: Detection and Response Evaluation

Evaluate the effectiveness of detection mechanisms and response actions. Were alerts timely and accurate? Did the team follow established incident response procedures?

Prompt 5: Lessons Learned and Preventative Measures

Identify key lessons from the incident. What changes can be made to monitoring, alerting, or infrastructure? How will these improvements reduce the risk of recurrence?

Prompt 6: Communication and Stakeholder Engagement

Review how communication was handled during the incident. Were stakeholders kept informed? What can be improved in future communications?

Prompt 7: Documentation and Follow-up

Ensure all findings are documented clearly. Schedule follow-up actions and assign responsibilities. How will the team track progress on improvements?

Conclusion

Utilizing these prompts during post-incident reviews can lead to deeper insights and stronger system reliability. Encourage open discussion, thorough analysis, and continuous learning to build resilient infrastructure.