incident_template
Incident Post-Mortem Report Template
Incident Summary
- Incident ID: [incident_id]
- Component: [component_id]
- Domain: [domain]
- Severity: [severity level]
- Status: [active/resolved]
- Duration: [start_time] to [end_time]
Timeline
Detection
- Time: [timestamp]
- Detected by: [probe/classifier/manual]
- Initial state: [state]
Escalation
- Time: [timestamp]
- Escalated to: [team/person]
- Reason: [reason]
Resolution
- Time: [timestamp]
- Resolved by: [team/person]
- Resolution method: [method]
Root Cause Analysis
Primary Cause
[Description of primary root cause]
Contributing Factors
- [Factor 1]
- [Factor 2]
- [Factor 3]
Root Cause Hints
[From health event root_cause_hints]
Impact
Affected Systems
- [System 1]
- [System 2]
User Impact
[Description of user impact]
Business Impact
[Description of business impact]
Recovery Actions Taken
-
Action: [action_name]
- Time: [timestamp]
- Result: [success/failure]
- Notes: [notes]
-
Action: [action_name]
- Time: [timestamp]
- Result: [success/failure]
- Notes: [notes]
Lessons Learned
What Went Well
- [Item 1]
- [Item 2]
What Could Be Improved
- [Item 1]
- [Item 2]
Action Items
- [Action item 1]
- [Action item 2]
Recommendations
- [Recommendation 1]
- [Recommendation 2]
- [Recommendation 3]
Metrics
- MTTR (Mean Time To Recover): [time]
- MTBF (Mean Time Between Failures): [time]
- Recovery Success Rate: [percentage]