On Mon, May 09, 2011 at 05:21:06PM +0200, Bouchard Louis wrote: > Hello, > > Le 09/05/2011 14:39, Vivek Goyal a ?crit : > > > > Prasad, > > > > I have never tried taking dump in MCE situation. Does kdump work on this > > machine with normal panic()? > > > > Use --debug and --serial option in kexec-tools to print some debug message > > and look for "I am in purgatory". This will tell you whether you hanged > > in first kernel or second kernel. > > > > Then put "outb()" messages in the kernel to trace what happened. > > > > Thanks > > Vivek > > > > _______________________________________________ > > kexec mailing list > > kexec at lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/kexec > I have seen numerous occurrences of MCE triggered kernel panics on both > RHEL & SLES environment used on IA32 architecture. Both in contexts > where kexec/kdump was being used. > That's interesting! Assuming that these are not software induced MCEs but panic() calls invoked due to unrecoverable memory errors in a physical machine, did you experience any situation where the kdump kernel hung/rebooted due to a second MCE (triggered while reading the faulty memory location belonging to the first kernel)? > Matter of fact, MCE triggered panic are part of the reason that pushed > me to work on crashdc : only one crash command is required to get the > MCE trace out of the kernel ring buffer. This avoids transfering massive > amount of vmcore file over the net. > What is the data that is contained in the faulty memory location (whose I/O triggered an MCE in the first place)? Basically we'd like to understand what a 'read' operation on the corrupted memory location would result in. > crashdc does well on those, mcelog can be applied on the data gathered. > We're contemplating a solution on the similar lines (refer the description of 'slim' kdump at https://lkml.org/lkml/2011/5/4/396) to create a 'crash tool readable coredump containing a message that indicates the cause of the crash as MCE (and not any data from the old memory). I'll take a look at the crashdc code and see if there are ideas that we can borrow from there. Thanks, K.Prasad