> The even better way to detect this is to be able to check whether this > is the kdump kernel and whether it got loaded due to a fatal MCE in the > first kernel and then match that error address with the error address of > the error which caused the first panic in the mce code. Then the second > kernel won't need to panic but simply log. The biggest problem with all of the logging (whether in machine check banks, or in error records from BIOS) is the lack of a timestamp. If there was a way to tell if this "just happened", or "happened a while ago" then such "take action" or "just log" decisions would be simpler. Maybe you don't need to do *all* those matching checks. Just a flag from the first kernel to say "I died from a fatal machine check" could be used to tell the kdump kernel "just log the cper" stuff. If the system is broken enough that more machine checks are still firing in the kdump kernel ... then you would miss trying to recover. But if more machine checks are happening, then the kdump kernel is likely doomed anyway. Getting a full memory dump after a machine check generally isn't all that useful anyway. The problem was (almost certainly) h/w, so not much benefit in decoding the dump to find which code was running when the h/w signalled. A second bite at getting the error logs from the death of the first kernel is worth it though. -Tony