Incident Response for OT
OT incident response requires different procedures, priorities, and expertise than IT incident response.
OT Incident Response Priorities
Priority Order
- Safety First: Protect people and environment
- Environmental Protection: Prevent environmental damage
- Equipment Protection: Avoid damaging expensive, long-lead-time equipment
- Service Continuity: Maintain power delivery to customers
- Evidence Preservation: Forensics (lower priority in OT than IT)
Unique OT Incident Response Challenges
Cannot Take Systems Offline
- 24/7 operations cannot stop
- Must maintain service during incident
- Containment options limited
- May need to operate with compromise present
Limited Response Tools
- Many forensics tools crash OT systems
- Cannot run vulnerability scanners
- Packet capture may impact performance
- Must use passive monitoring
Specialized Expertise Required
- Need OT protocol knowledge
- Understanding of physical processes
- Safety system expertise
- Vendor-specific system knowledge
Incident Response Framework
1. Preparation
Before incidents occur:
Incident Response Plan
- OT-specific procedures
- Contact lists (internal, vendors, regulators)
- Escalation criteria
- Communication templates
- Recovery procedures
Team and Training
- Designate OT incident response team
- Include operations and safety personnel
- Regular tabletop exercises
- Vendor coordination procedures
Tools and Access
- OT-safe forensics tools
- Passive monitoring capability
- Secure communication channels
- Backup systems and procedures
2. Detection and Analysis
Identifying and understanding incidents:
Detection Sources
- Network intrusion detection
- SCADA alarm systems
- Operational anomalies
- Vendor security alerts
- Employee reports
Initial Assessment
- What systems are affected?
- Is it confirmed cybersecurity incident?
- Current operational impact?
- Safety implications?
- Scope and severity?
3. Containment
Limiting spread without stopping operations:
Network Containment
- Block command and control traffic
- Isolate affected network segments
- Disable remote access
- Enhanced monitoring of adjacent systems
System Containment
- Disconnect compromised systems if safe
- Switch to manual operations if needed
- Deploy backup systems
- Prevent lateral movement
Operational Containment
- Shift to manual control where possible
- Activate backup control center
- Increase staffing
- Communicate with grid operator
4. Eradication
Removing threat from environment:
Identify Root Cause
- How did attacker get in?
- What vulnerabilities were exploited?
- Extent of compromise?
- Persistence mechanisms?
Remove Threat
- Restore from clean backups
- Rebuild compromised systems
- Patch exploited vulnerabilities
- Change credentials
- Verify attacker removed
5. Recovery
Returning to normal operations:
Verification
- Systems clean of malware
- Configuration correct
- Safety systems tested
- No backdoors present
Gradual Restoration
- Test in non-production first
- Phased return to service
- Enhanced monitoring initially
- Validate operations
6. Lessons Learned
Improving for future:
Post-Incident Review
- What happened and why?
- What worked and didn'''t?
- How to prevent recurrence?
- What to improve?
Implementation
- Update procedures
- Implement preventive controls
- Share lessons with industry
- Retrain staff
Incident Severity Levels
Minor (Level 1)
- Impact: No operational impact
- Scope: Single non-critical system
- Response: IT/OT security team
- Timeline: Standard response
Moderate (Level 2)
- Impact: Limited operational impact, workarounds available
- Scope: Multiple systems or one critical system
- Response: Add operations personnel
- Timeline: Expedited response
Major (Level 3)
- Impact: Significant operational impact, manual operations required
- Scope: Critical systems or widespread
- Response: Full incident response team, management
- Timeline: Immediate response, 24/7 operations
Critical (Level 4)
- Impact: Severe safety risk, major outage, or grid instability
- Scope: Safety systems or bulk electric system
- Response: Executive leadership, external coordination, regulatory
- Timeline: All-hands response, external assistance
OT Incident Response Team
Core Team
- Incident Commander: Overall coordination and decision authority
- OT Security Lead: Technical cybersecurity expertise
- Operations Representative: Operational impact and procedures
- Safety Representative: Safety system expertise
- IT Liaison: IT security coordination
- Communications: Internal and external messaging
Extended Team (As Needed)
- Control system vendors
- Legal counsel
- Public relations
- Regulatory affairs
- HR (if insider threat)
- Law enforcement liaison
Communication Requirements
Regulatory Reporting
- NERC CIP: Within 1 hour for reportable incidents
- CISA: Within 72 hours for critical infrastructure
- TSA (pipelines): Within 12 hours
- State/local: Per jurisdiction requirements
- Document all notifications
External Communications
- Vendors (for technical support)
- ISACs (E-ISAC, ONG-ISAC)
- Law enforcement (if criminal)
- Customers (if service impact)
- Media (coordinated messaging)
Internal Communications
- Executive management
- Board of directors (significant incidents)
- Operations personnel
- IT organization
- Affected business units
Specific Incident Scenarios
Ransomware Incident
- Do NOT pay ransom
- Isolate IT/OT boundary immediately
- Shift to manual operations if needed
- Restore from offline backups
- Investigate how it spread
Compromised Vendor Access
- Disable vendor VPN immediately
- Review logs for vendor activity
- Reset all vendor credentials
- Check for persistence mechanisms
- Coordinate vendor security review
Nation-State Advanced Persistent Threat
- Document indicators of compromise
- Do NOT alert attacker
- Enhanced monitoring to understand scope
- Coordinate coordinated eviction
- Consider external incident response assistance
Recovery Priorities
Priority Order for Restoration
- Safety systems: First to restore and verify
- Critical control systems: Essential for operations
- Monitoring systems: Visibility into process
- Engineering workstations: Configuration capability
- Supporting systems: Historians, reporting, etc.
Verification Before Restoration
- System integrity verified (no malware)
- Configuration matches baseline
- Logging enabled and functioning
- Safety systems tested
- All credentials changed
Next Lesson: Risk assessment templates for OT environments.