Module 3: Implementation Guide

Incident Response for OT

20 min
+75 XP

Incident Response for OT

OT incident response requires different procedures, priorities, and expertise than IT incident response.

OT Incident Response Priorities

Priority Order

  1. Safety First: Protect people and environment
  2. Environmental Protection: Prevent environmental damage
  3. Equipment Protection: Avoid damaging expensive, long-lead-time equipment
  4. Service Continuity: Maintain power delivery to customers
  5. Evidence Preservation: Forensics (lower priority in OT than IT)

Unique OT Incident Response Challenges

Cannot Take Systems Offline

  • 24/7 operations cannot stop
  • Must maintain service during incident
  • Containment options limited
  • May need to operate with compromise present

Limited Response Tools

  • Many forensics tools crash OT systems
  • Cannot run vulnerability scanners
  • Packet capture may impact performance
  • Must use passive monitoring

Specialized Expertise Required

  • Need OT protocol knowledge
  • Understanding of physical processes
  • Safety system expertise
  • Vendor-specific system knowledge

Incident Response Framework

1. Preparation

Before incidents occur:

Incident Response Plan

  • OT-specific procedures
  • Contact lists (internal, vendors, regulators)
  • Escalation criteria
  • Communication templates
  • Recovery procedures

Team and Training

  • Designate OT incident response team
  • Include operations and safety personnel
  • Regular tabletop exercises
  • Vendor coordination procedures

Tools and Access

  • OT-safe forensics tools
  • Passive monitoring capability
  • Secure communication channels
  • Backup systems and procedures

2. Detection and Analysis

Identifying and understanding incidents:

Detection Sources

  • Network intrusion detection
  • SCADA alarm systems
  • Operational anomalies
  • Vendor security alerts
  • Employee reports

Initial Assessment

  • What systems are affected?
  • Is it confirmed cybersecurity incident?
  • Current operational impact?
  • Safety implications?
  • Scope and severity?

3. Containment

Limiting spread without stopping operations:

Network Containment

  • Block command and control traffic
  • Isolate affected network segments
  • Disable remote access
  • Enhanced monitoring of adjacent systems

System Containment

  • Disconnect compromised systems if safe
  • Switch to manual operations if needed
  • Deploy backup systems
  • Prevent lateral movement

Operational Containment

  • Shift to manual control where possible
  • Activate backup control center
  • Increase staffing
  • Communicate with grid operator

4. Eradication

Removing threat from environment:

Identify Root Cause

  • How did attacker get in?
  • What vulnerabilities were exploited?
  • Extent of compromise?
  • Persistence mechanisms?

Remove Threat

  • Restore from clean backups
  • Rebuild compromised systems
  • Patch exploited vulnerabilities
  • Change credentials
  • Verify attacker removed

5. Recovery

Returning to normal operations:

Verification

  • Systems clean of malware
  • Configuration correct
  • Safety systems tested
  • No backdoors present

Gradual Restoration

  • Test in non-production first
  • Phased return to service
  • Enhanced monitoring initially
  • Validate operations

6. Lessons Learned

Improving for future:

Post-Incident Review

  • What happened and why?
  • What worked and didn'''t?
  • How to prevent recurrence?
  • What to improve?

Implementation

  • Update procedures
  • Implement preventive controls
  • Share lessons with industry
  • Retrain staff

Incident Severity Levels

Minor (Level 1)

  • Impact: No operational impact
  • Scope: Single non-critical system
  • Response: IT/OT security team
  • Timeline: Standard response

Moderate (Level 2)

  • Impact: Limited operational impact, workarounds available
  • Scope: Multiple systems or one critical system
  • Response: Add operations personnel
  • Timeline: Expedited response

Major (Level 3)

  • Impact: Significant operational impact, manual operations required
  • Scope: Critical systems or widespread
  • Response: Full incident response team, management
  • Timeline: Immediate response, 24/7 operations

Critical (Level 4)

  • Impact: Severe safety risk, major outage, or grid instability
  • Scope: Safety systems or bulk electric system
  • Response: Executive leadership, external coordination, regulatory
  • Timeline: All-hands response, external assistance

OT Incident Response Team

Core Team

  • Incident Commander: Overall coordination and decision authority
  • OT Security Lead: Technical cybersecurity expertise
  • Operations Representative: Operational impact and procedures
  • Safety Representative: Safety system expertise
  • IT Liaison: IT security coordination
  • Communications: Internal and external messaging

Extended Team (As Needed)

  • Control system vendors
  • Legal counsel
  • Public relations
  • Regulatory affairs
  • HR (if insider threat)
  • Law enforcement liaison

Communication Requirements

Regulatory Reporting

  • NERC CIP: Within 1 hour for reportable incidents
  • CISA: Within 72 hours for critical infrastructure
  • TSA (pipelines): Within 12 hours
  • State/local: Per jurisdiction requirements
  • Document all notifications

External Communications

  • Vendors (for technical support)
  • ISACs (E-ISAC, ONG-ISAC)
  • Law enforcement (if criminal)
  • Customers (if service impact)
  • Media (coordinated messaging)

Internal Communications

  • Executive management
  • Board of directors (significant incidents)
  • Operations personnel
  • IT organization
  • Affected business units

Specific Incident Scenarios

Ransomware Incident

  1. Do NOT pay ransom
  2. Isolate IT/OT boundary immediately
  3. Shift to manual operations if needed
  4. Restore from offline backups
  5. Investigate how it spread

Compromised Vendor Access

  1. Disable vendor VPN immediately
  2. Review logs for vendor activity
  3. Reset all vendor credentials
  4. Check for persistence mechanisms
  5. Coordinate vendor security review

Nation-State Advanced Persistent Threat

  1. Document indicators of compromise
  2. Do NOT alert attacker
  3. Enhanced monitoring to understand scope
  4. Coordinate coordinated eviction
  5. Consider external incident response assistance

Recovery Priorities

Priority Order for Restoration

  1. Safety systems: First to restore and verify
  2. Critical control systems: Essential for operations
  3. Monitoring systems: Visibility into process
  4. Engineering workstations: Configuration capability
  5. Supporting systems: Historians, reporting, etc.

Verification Before Restoration

  • System integrity verified (no malware)
  • Configuration matches baseline
  • Logging enabled and functioning
  • Safety systems tested
  • All credentials changed

Next Lesson: Risk assessment templates for OT environments.

Complete this lesson

Earn +75 XP and progress to the next lesson