2011-01-27

Security Basics: Control System Forensics

(This article was originally published on the Findings From the Field blog.)

Most network administrators recognize the term computer forensics as the discipline of collecting evidence from computers for use in court. What may not be apparent is that computer forensics practices and technologies are also useful tools for general trouble-shooting. Forensic records are detailed enough to identify the cause of intrusions and other causes for litigation. As a result, these records are almost always detailed enough to identify causes of other kinds of problems, from performance anomalies to operator and administrator errors and omissions. But what kinds of real-time forensics are appropriate to deploy on industrial control systems?

Control System Forensics

There are few forensics tools designed specifically for industrial control systems - the number of sites investing in control-system forensics is still too small. Fortunately, many general-purpose forensics tools work well with most modern control systems, though they may not add as much value to the oldest and most heavily-customized proprietary systems. Common forensics tools include:
  • A reference clock system - synchronized clocks make it possible to reliably compare timestamps on events and so track down chains of causality.
  • Operating System activity logs and audit logs - modern UNIX and Windows operating systems support detailed logging of both routine and abnormal events, even though such logging generally needs to be turned on - it is not the default.
  • Application audit logs - Increasingly, control system applications also support detailed audit logging, though again such logging is generally not the default. Audit logs tell you who did what, even when those operations are normal, authorized operations.
  • Central log aggregation - storing log entries on a secure central system, such as a Security Event Manager or log aggregator makes all of your logs accessible in one convenient user interface. Secure central logging also makes tampering with logs much more difficult if an intruder or a piece of malware is trying to erase evidence of their activity.
  • Host configuration monitoring - some tools, like the Industrial Defender Host Intrusion Detection System, show you not just what is happening on a system, but notify you of suspicious changes. For example - changes to important files, new account creation, and suspicious processes running.
  • Network monitoring - many kinds of network monitoring are possible and almost all of them add value when it comes to forensic investigations and advanced trouble-shooting. For example: network intrusion detection is basic network monitoring functionality. In addition, you might consider capturing and archiving for a period of weeks all traffic exchanged with important controllers, or using techniques like the Digital Bond Quickdraw rules, which are included as part of the Industrial Defender Network Sensor ruleset, to generate an audit log for PLC's and other devices which do not support auditing themselves.
Note: With all of these approaches of course, it is important to consult with the vendor supporting your control system to ensure that none of these monitoring technologies will adversely affect the safe and correct operation of your system. In addition, any changes to control system equipment must first be tested and validated on a test-bed designed to minimize adverse impacts on production systems. Some enterprise-focused monitoring technologies are too intrusive to deploy safely on control system equipment, so testing and vendor consultation is a vital step in deploying forensics technologies.

Legal Issues

In most jurisdictions, there are laws governing how much host and network information an organization or individual can capture and what use can be made of captured information. As a rule, laws applicable to control systems are simpler than laws which apply to internet service providers or social networking sites, but there are laws all the same. Capturing data without proper authorizations and without proper notification and consent is generally a violation of corporate security policies, and is occasionally a violation of the laws of the land. When establishing a forensics and monitoring program, you need to consult with your legal advisers and ensure that proper authorizations and other mechanisms are in place. More detail on legal issues is available in several of the resources listed at the end of this article.

Incident Response and Planning

No discussion of forensics is complete without mention of incident response and incident response planning. Incident response is more than having available and knowing how to use a set of technologies and tools. Incident response starts with a plan - identify your most valuable assets, identify the kinds of ways those assets could be compromised, and put a plan together to respond to each compromise scenario. For example, the response to a "low and slow" intelligence-gathering attack on non-critical assets at a nuclear reactor site might involve law enforcement experts and a strategy of monitoring and investigation. The objective might be to deceive the adversary until they are identified and apprehended. Contrast that with the response to common malware having compromised a critical alarm server - the strategy there might be a much faster "unplug, image, rebuild, and redeploy" response, because of the threat to the availability of the control system.

At some point, many incident response plans will require the capture of a forensic data set with a data-gathering toolkit. Such toolkits are described in some depth in most detailed resources on forensics. The toolkits generally consist of removable media hosting a variety of tools:
  • tools to use on a running, potentially compromised or otherwise malfunctioning system, to capture volatile data, such as the contents of memory and system status information such as running processes and open files and network connections,
  • bootable media to capture hard disk images and images of suspect removable media for later analysis,
  • and any tools which may be unique to your environment, such as PLC programming tools sufficient to make copies of device configurations for later analysis.
Incident response teams must be trained as to how to use these tools, and how to document and secure any captured information. Specific "chain of custody" procedures must be followed if a forensic analysis is to be accepted as evidence in court proceedings.

For more routine investigations, full chain-of-custody measures may not strictly be needed, but chain-of-custody discipline contributes to ensuring that all the information needed for later analysis is captured. Without this first-responder discipline, it is easy for response teams to focus on quickly repairing damaged or compromised systems, without gathering enough information for analysis. The result is a control system which is quickly restored, but then fails again later on, since not enough data gathered to determine and correct the root cause of the original failure.

It is worth re-emphasizing: incident response is more than a forensics toolkit. The first step is almost always a rapid escalation to security experts to determine what kind of attack the organization is facing, and to select an appropriate, pre-defined response plan for that kind of attack. Many response plans will involve contacting local or federal law enforcement agencies.

Looking Forward

Many resources are available to anyone who would like to know more about incident response and designing security systems in support of forensics:
Incident response plans and forensics should be part of every security program. Planning for security and performance incidents means that your teams are ready and practiced when incidents occur, resulting in less down-time for control systems while important data is captured and recovery plans are carried out. Designing your security program to capture important information for later analysis is essential to identifying and correcting root causes of security incidents as well as performance and reliability incidents.

No comments:

Post a Comment