Incident Response for OT

OT incident response requires different procedures, priorities, and expertise than IT incident response.

OT Incident Response Priorities

Priority Order

Safety First: Protect people and environment
Environmental Protection: Prevent environmental damage
Equipment Protection: Avoid damaging expensive, long-lead-time equipment
Service Continuity: Maintain power delivery to customers
Evidence Preservation: Forensics (lower priority in OT than IT)

Unique OT Incident Response Challenges

Cannot Take Systems Offline

24/7 operations cannot stop
Must maintain service during incident
Containment options limited
May need to operate with compromise present

Limited Response Tools

Many forensics tools crash OT systems
Cannot run vulnerability scanners
Packet capture may impact performance
Must use passive monitoring

Specialized Expertise Required

Need OT protocol knowledge
Understanding of physical processes
Safety system expertise
Vendor-specific system knowledge

Incident Response Framework

1. Preparation

Before incidents occur:

Incident Response Plan

OT-specific procedures
Contact lists (internal, vendors, regulators)
Escalation criteria
Communication templates
Recovery procedures

Team and Training

Designate OT incident response team
Include operations and safety personnel
Regular tabletop exercises
Vendor coordination procedures

Tools and Access

OT-safe forensics tools
Passive monitoring capability
Secure communication channels
Backup systems and procedures

2. Detection and Analysis

Identifying and understanding incidents:

Detection Sources

Network intrusion detection
SCADA alarm systems
Operational anomalies
Vendor security alerts
Employee reports

Initial Assessment

What systems are affected?
Is it confirmed cybersecurity incident?
Current operational impact?
Safety implications?
Scope and severity?

3. Containment

Limiting spread without stopping operations:

Network Containment

Block command and control traffic
Isolate affected network segments
Disable remote access
Enhanced monitoring of adjacent systems

System Containment

Disconnect compromised systems if safe
Switch to manual operations if needed
Deploy backup systems
Prevent lateral movement

Operational Containment

Shift to manual control where possible
Activate backup control center
Increase staffing
Communicate with grid operator

4. Eradication

Removing threat from environment:

Identify Root Cause

How did attacker get in?
What vulnerabilities were exploited?
Extent of compromise?
Persistence mechanisms?

Remove Threat

Restore from clean backups
Rebuild compromised systems
Patch exploited vulnerabilities
Change credentials
Verify attacker removed

5. Recovery

Returning to normal operations:

Verification

Systems clean of malware
Configuration correct
Safety systems tested
No backdoors present

Gradual Restoration

Test in non-production first
Phased return to service
Enhanced monitoring initially
Validate operations

6. Lessons Learned

Improving for future:

Post-Incident Review

What happened and why?
What worked and didn'''t?
How to prevent recurrence?
What to improve?

Implementation

Update procedures
Implement preventive controls
Share lessons with industry
Retrain staff

Incident Severity Levels

Minor (Level 1)

Impact: No operational impact
Scope: Single non-critical system
Response: IT/OT security team
Timeline: Standard response

Moderate (Level 2)

Impact: Limited operational impact, workarounds available
Scope: Multiple systems or one critical system
Response: Add operations personnel
Timeline: Expedited response

Major (Level 3)

Impact: Significant operational impact, manual operations required
Scope: Critical systems or widespread
Response: Full incident response team, management
Timeline: Immediate response, 24/7 operations

Critical (Level 4)

Impact: Severe safety risk, major outage, or grid instability
Scope: Safety systems or bulk electric system
Response: Executive leadership, external coordination, regulatory
Timeline: All-hands response, external assistance

OT Incident Response Team

Core Team

Incident Commander: Overall coordination and decision authority
OT Security Lead: Technical cybersecurity expertise
Operations Representative: Operational impact and procedures
Safety Representative: Safety system expertise
IT Liaison: IT security coordination
Communications: Internal and external messaging

Extended Team (As Needed)

Control system vendors
Legal counsel
Public relations
Regulatory affairs
HR (if insider threat)
Law enforcement liaison

Communication Requirements

Regulatory Reporting

NERC CIP: Within 1 hour for reportable incidents
CISA: Within 72 hours for critical infrastructure
TSA (pipelines): Within 12 hours
State/local: Per jurisdiction requirements
Document all notifications

External Communications

Vendors (for technical support)
ISACs (E-ISAC, ONG-ISAC)
Law enforcement (if criminal)
Customers (if service impact)
Media (coordinated messaging)

Internal Communications

Executive management
Board of directors (significant incidents)
Operations personnel
IT organization
Affected business units

Specific Incident Scenarios

Ransomware Incident

Do NOT pay ransom
Isolate IT/OT boundary immediately
Shift to manual operations if needed
Restore from offline backups
Investigate how it spread

Compromised Vendor Access

Disable vendor VPN immediately
Review logs for vendor activity
Reset all vendor credentials
Check for persistence mechanisms
Coordinate vendor security review

Nation-State Advanced Persistent Threat

Document indicators of compromise
Do NOT alert attacker
Enhanced monitoring to understand scope
Coordinate coordinated eviction
Consider external incident response assistance

Recovery Priorities

Priority Order for Restoration

Safety systems: First to restore and verify
Critical control systems: Essential for operations
Monitoring systems: Visibility into process
Engineering workstations: Configuration capability
Supporting systems: Historians, reporting, etc.

Verification Before Restoration

System integrity verified (no malware)
Configuration matches baseline
Logging enabled and functioning
Safety systems tested
All credentials changed

Next Lesson: Risk assessment templates for OT environments.