Andrew Morton <akpm at linux-foundation.org> writes: > > Much of the onus is upon the various RAS tool developers to demonstrate why it > is unsuitable for their use and, hopefully, to explain how it can be fixed for > them. My current take on the situation. There are 4 different cases we care about. - Trivial in kernel message failure reports. (Oops, backtraces and the like) - Crash dumps. - Debuggers. - kernel Probes. The in kernel failure messages seem to be doing a good job and are reasonably simple to maintain. For crash dumping we have sufficient infrastructure in the kernel now in the kexec on panic work, and it is simpler and more reliable then the previous attempts. Although those kernel code paths could be made simpler yet and probably should be. Only when it comes to debuggers does it seem we don't have something we can generally settle on and agree on. All I know is that any set of code that wants to be common infrastructure that makes the assumption that the kernel is mostly not broken is not interesting for use when things are fully automated. Because it fails to work in real world failure cases. Those things only work in the artificial testing environments of developers. Right now I have seen so little to seriously address these real world concerns in suggests or patches for some kind of infrastructure that I'm tired of discussing it. I admit I haven't seen or heard of those patches either but even their description sounds non-interesting. Eric