[RFC] Kdump and memory error handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 04, 2011 at 10:39:14PM +0200, Andi Kleen wrote:
> > Any thoughts/suggestions?
> 
> My old attempts to solve this are
> 
> Don't dump on MCE:
> 
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/xpanic
> 

The problem we seen in avoiding a panic->crash_kexec->[coredump capture] is
that the user may not have a means to know the reason for crash, unless
the serial console is connected to capture and store the panic string.

Alternatively a 'slim' kdump (as described here:
https://lkml.org/lkml/2011/5/4/396) would not contain meaningless data from
the old memory, but inform the user about the cause of the crash. I'm
intending to post some patches with a quick implementation of it soon.

> Handle dumps of corrupted memory regresions:
> 
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/crashdump
> 

> IMHO these patches are still the right solutions for this.
> 

Like Vatsa had raised, the processor's behaviour upon reading (or any I/O
operation) the faulty memory location isn't clearly defined (to the
extent I read through System Programming Guide Part 1, Volume 3A,
Chapter 15). In such a scenario, disabling MCE for the kdump kernel (which can
potentially read the faulty memory) is making things hazy.

Thanks,
K.Prasad




[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux