Re: duplicate a variable!!!!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Realistically, radiation causes enough bitflips in DRAM cells that most
> servers have ECC memory. It causes so few bitflips in processors that
> most systems ignore that possibility - other risks greatly outweigh
> processor bitflips. When you are talking about something that is so
> safety-critical (or cost-critical, or high-risk - such as in space) that
> it is a real concern, then you duplicate or triplicate the processor,
> and/or you use radiation-hardened devices, and/or you use external
> shielding, and/or you use specialised processor designs with ECC right
> through to the register level.


... and then, on top of that, you duplicate your variables into different parts of RAM and 
do a bit comparison before you use them for some purpose, just in case you have the chance 
to catch something - and perform a safety reaction - in time to avert disaster.

> Certainly nothing in software can be of any help when you are facing
> unreliabilities in the processor itself.

Aggressive Online testing is really the only thing you can do, to detect such errors in 
the first place and - hopefully - have time to perform a suitable safety reaction.

> Maybe I should be taking a slightly more humble tone here :-) I have
> done some safety-critical embedded software development, but not at your
> level - in the systems I have done, there has always been a human that
> can override the system in the event of failure, and it is always safe
> to switch off.

Well when you've got hundreds of tonnes of train hurtling down the track, and the light 
says "RED", and your computer says "GREEN" because its got a problem, you don't have time 
for humans to get involved. ;)

I'm no expert on this issue, I merely work in the group of experts providing an 
industrial-standard solution, and I'm learning a lot too.


> Well, if you are correct that such variable duplication is a help (I
> still don't see how, but you are more qualified than me to talk about
> it), then I agree that it would be a cool feature to have in gcc.

It is not supposed to be a full solution - early detection of RAM corruption can only help 
bring the system, which is being depended on, offline so that other backup systems can be 
inserted in place.

> However, generally when I have seen discussions about the use of gcc in
> safety-critical development, the main concerns seem to be about testing,
> validation and certification of the compiler using things like Plum
> Hall. Do you use gcc for such safety-critical systems, and if so do you
> do any sort of certification for it?

Of course, all validation of the compiler and certification is a requirement as part of 
its uses for this purpose ..

> The other feature that gcc could gain that would improve its use in
> safety-critical systems is a set of warnings for MISRA compliance. Much
> as I hate MISRA, it is a standard that is often used in such systems.

Its one thing to hate standards, its another thing to implement them, ship them as part of 
a product, and reliably see the fruits of such labour in safety statistics. ;)


> And while bitflips due to radiation do occur, I think application code
> bugs are a much more common source of problems.

Absolutely, there are no absolutes! :)

 > Compile-time warnings
> and error checking are steadily improving with each new version of gcc,
> but I'd say that more work here would have greater benefits for code
> safety than automatic variable duplication.

I concur - and this is why I suggest that anyone looking at a safety-critical application, 
wanting to do variable duplication for protective reasons, implement it themselves.


-- 
;                                           Thales Austria GmbH
Jay Vaughan,                                Scheydgasse 41
Software Developer                          1210 Vienna AUSTRIA
============================================--------------------



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux