AIOS DNA

incident_template

Incident Post-Mortem Report Template

Incident Summary

  • Incident ID: [incident_id]
  • Component: [component_id]
  • Domain: [domain]
  • Severity: [severity level]
  • Status: [active/resolved]
  • Duration: [start_time] to [end_time]

Timeline

Detection

  • Time: [timestamp]
  • Detected by: [probe/classifier/manual]
  • Initial state: [state]

Escalation

  • Time: [timestamp]
  • Escalated to: [team/person]
  • Reason: [reason]

Resolution

  • Time: [timestamp]
  • Resolved by: [team/person]
  • Resolution method: [method]

Root Cause Analysis

Primary Cause

[Description of primary root cause]

Contributing Factors

  1. [Factor 1]
  2. [Factor 2]
  3. [Factor 3]

Root Cause Hints

[From health event root_cause_hints]

Impact

Affected Systems

  • [System 1]
  • [System 2]

User Impact

[Description of user impact]

Business Impact

[Description of business impact]

Recovery Actions Taken

  1. Action: [action_name]

    • Time: [timestamp]
    • Result: [success/failure]
    • Notes: [notes]
  2. Action: [action_name]

    • Time: [timestamp]
    • Result: [success/failure]
    • Notes: [notes]

Lessons Learned

What Went Well

  • [Item 1]
  • [Item 2]

What Could Be Improved

  • [Item 1]
  • [Item 2]

Action Items

  • [Action item 1]
  • [Action item 2]

Recommendations

  1. [Recommendation 1]
  2. [Recommendation 2]
  3. [Recommendation 3]

Metrics

  • MTTR (Mean Time To Recover): [time]
  • MTBF (Mean Time Between Failures): [time]
  • Recovery Success Rate: [percentage]

Was this helpful?