All, I took a look at the failure analysis section, and perhaps I can help clarify RTUs vs. 'dialups', but it is hard for me to tell much about the SCADA Systems they run. Dial-ups are usually remote RTUs that are dialed from the control center and polled for data/changes/exception events, etc. This is usually a continuous process, with a bank of modems to dial hundreds of RTUs. OTOH, if the reference is simply to RTUs, the implication is usually a continuous 'always on' connection. Some are by ethernet, some (the older ones) are a serial connection, usually RS232 or RS485. These provide realtime data, updating as often as the datapoint is configured in the database. I have only 1-1/2 years in this business, so I am very much a newbie. I don't know if the XA21 system they are using was Unix or Windows based (pretty critical issue.) I google'd the app, and found a vendor with one screen shot, there was a series of menu selections at the top of the screen with keyboard shortcut underlines. Based on this, I would be inclined to think this is a Windows system, but I would not judge based on *one* screen shot. They said they got the system in '98, and I don't think windows based SCADA systems were all that prevalent, most of the large ones werre unix flavors. So, absent the knowledge of an OS, my best guess is "it depends". Usually, SCADA data collection from the field into the database is done on redundant dedicated servers, with the control room using "view" or "remote" nodes. Also, I can see where the alarming process is critical in a huge EMS system. I would hazard another guess that this process was running on server(s) separate from the main SCADA database servers, to offload processing. But maybe not. So, when alarming failed, the data collection process continued to update the operator screens, (that is, no data points went "stale", which would have tipped them off to an issue.) And since the alarming processes hung, the sys admins would not know that the process failed, unless they had an external monitoring application running. OTOH, when the control room suddenly went quiet (seems to me it would, based on the failure of the alarming process, as control rooms are constantly going pingpingping or dingdingding, with red or yellow highlights flashing behind alarm events on the console screens), MY curiosity would certainly be piqued. The failover and subsequent failure of the standy server might support the case for a hung or corrupt alarm process. Since the alarming is a critical process, and when the standby server started, and could not "acquire" the process it failed as well. Makes sense. And a hard boot of both systems would probably kick-start the alarming process, if it was running on the primary/backup SCADA servers. But yes, it would take awhile (ever watched a four-processor/mega GB RAM server boot and find the controllers, LUNS, and drives, and start all the apps? Takes quite awhile.) So, I can see where operations would not want a system-wide cold-boot. Things in the world of water treatment and distribution tend to happen comparatively slowly, compared to the world of electricity generation and distribution, where I would think things happen much faster (something like 2/3 the speed of light? anyway, the rate of electron flow over the wires.) The report seemed very high level to me, and I can kinda tell what went on in the control rooms, but there are many gaps in my knowledge/experience. I know I rambled quite a bit, but I hope this helps, Rick Bertolett Austin Water Utility SCADA