> The error context is in the behavior of the hw. If the error is fatal, you > won't see it - the machine will panic or do something else to prevent error > propagation. It definitely won't run any software anymore. > > If you see the error getting logged, it means it is not fatal enough to kill > the machine. One place in the fatal case where I would like to see more information is the "Action required: data load in error *UN*recoverable area of kernel" [emphasis on the "UN" added]. case. We have a few places where the kernel does recover. And most places we crash. Our code for the recoverable cases is fragile. Most of this series is about repairing regressions where we used to recover from places where kernel is doing get_user() or copy_from_user() which can be recovered if those places get an error return and the kernel kills the process instead of crashing. A long time ago I posted some patches to include a stack trace for this type of crash. It didn't make it into the kernel, and I got distracted by other things. If we had that, it would have been easier to diagnose this regression (Shaui Xie would have seen crashes with a stack trace pointing to code that used to recover in older kernels). Folks with big clusters would also be able to point out other places where the kernel crashes often enough that additional EXTABLE recovery paths would be worth investigating. So: 1) We need to fix the regressions. That just needs new commit messages for these patches that explain the issue better. 2) I'd like to see a patch for a stack trace for the unrecoverable case. 3) I don't see much value in a message that reports the recoverable case. Yazen: At one point I think you said you were looking at adding additional decorations to the return value from mce_severity() to indicate actions needed for recoverable errors (kill the process, offline the page) rather than have do_machine_check() figure it out by looking at various fields in the "struct mce". Did that go anywhere? Those extra details might be interesting in the tracepoint. -Tony