RE: DOE Releases Interim Report on Blackouts/Power Outages, Focus on Cyber Security

Richard.Bertolett@ci.austin.tx.us · Fri, 21 Nov 2003 14:19:03 -0600

All,

I took a look at the failure analysis section, and perhaps I can help
clarify RTUs vs. 'dialups', but it is hard for me to tell much about the
SCADA Systems they run. 

Dial-ups are usually remote RTUs that are dialed from the control center and
polled for data/changes/exception events, etc.  This is usually a continuous
process, with a bank of modems to dial hundreds of RTUs.

OTOH, if the reference is simply to RTUs, the implication is usually a
continuous 'always on' connection.  Some are by ethernet, some (the older
ones) are a serial connection, usually RS232 or RS485.  These provide
realtime data, updating as often as the datapoint is configured in the
database.

I have only 1-1/2 years in this business, so I am very much a newbie.  I
don't know if the XA21 system they are using was Unix or Windows based
(pretty critical issue.)  I google'd the app, and found a vendor with one
screen shot, there was a series of menu selections at the top of the screen
with keyboard shortcut underlines.  Based on this, I would be inclined to
think this is a Windows system, but I would not judge based on *one* screen
shot.  They said they got the system in '98, and I don't think windows based
SCADA systems were all that prevalent, most of the large ones werre unix
flavors.  So, absent the knowledge of an OS, my best guess is "it depends".

Usually, SCADA data collection from the field into the database is done on
redundant dedicated servers, with the control room using "view" or "remote"
nodes.  Also, I can see where the alarming process is critical in a huge EMS
system.  I would hazard another guess that this process was running on
server(s) separate from the main SCADA database servers, to offload
processing.  But maybe not. So, when alarming failed, the data collection
process continued to update the operator screens, (that is, no data points
went "stale", which would have tipped them off to an issue.)  And since the
alarming processes hung, the sys admins would not know that the process
failed, unless they had an external monitoring application running. 

OTOH, when the control room suddenly went quiet (seems to me it would, based
on the failure of the alarming process, as control rooms are constantly
going pingpingping or dingdingding, with red or yellow highlights flashing
behind alarm events on the console screens), MY curiosity would certainly be
piqued.  

The failover and subsequent failure of the standy server might support the
case for a hung or corrupt alarm process.  Since the alarming is a critical
process, and when the standby server started, and could not "acquire" the
process it failed as well.  Makes sense.  And a hard boot of both systems
would probably kick-start the alarming process, if it was running on the
primary/backup SCADA servers.  But yes, it would take awhile (ever watched a
four-processor/mega GB RAM server boot and find the controllers, LUNS, and
drives, and start all the apps?  Takes quite awhile.)

So, I can see where operations would not want a system-wide cold-boot.
Things in the world of water treatment and distribution tend to happen
comparatively slowly, compared to the world of electricity generation and
distribution,  where I would think things happen much faster (something like
2/3 the speed of light? anyway, the rate of electron flow over the wires.)

The report seemed very high level to me, and I can kinda tell what went on
in the control rooms, but there are many gaps in my knowledge/experience.

I know I rambled quite a bit, but I hope this helps,
Rick Bertolett
Austin Water Utility SCADA