On Mon, Oct 10, 2011 at 09:07:25AM +0200, Borislav Petkov wrote: > On Fri, Oct 07, 2011 at 09:42:19PM +0530, K.Prasad wrote: > > The problem, as pointed out by Borislav Petkov in a different mail, is that > > we might end up capturing a vmcore containing corrupted data when the > > same is not required for analysing the cause of the crash. > > > > Of course, all this is assuming that reading the faulty memory with MCE > > disabled is harmless. However, the effect of a read operation in this > > case is undefined. > > Frankly, I don't think that it is undefined - you basically should be > able to read DRAM albeit with the corrupted data in it. However, you > probably best disable the whole DRAM error detection first by clearing > a couple of bits in MC4_CTL_MASK (at least on AMD that should work, I > dunno how Intel does that). > The MC4_CTL_MASK doesn't appear to be defined in the kernel. Looking at http://support.amd.com/us/Processor_TechDocs/26094.PDF, Page 196, it states that "This register is typically programmed by BIOS and not by the Kernel software". So, in any case we may not be able to disable machine-check exceptions (MCEs) only within the context of kexec'ed kernel. Let me know if I've missed something here. > But, regardless, according to Vivek, the "makedumpfile" tool should be > able to jump over poisoned pages and you don't need all the hoopla above > at all, right? > In short, the answer is yes. We could add a new string, say "CRASH_REASON=PANIC_MCE" to VMCOREINFO elf-note which can be parsed by 'makedumpfile' and get away without adding the new NT_NOCOREDUMP elf-note. Parsing through the log_buf to lookout for panic string from inside 'makedumpfile' appears to be a clumsy solution though. The suggestion to make NT_NOCOREDUMP to contain more fine-granular information can be met by using meaningful strings for VMCOREINFO. --- In this context, I wish to quickly recollect the issues we've discussed thus far, their proposed solutions and re-evaluate the need for new elf-note. i) Scenario1: System crashes because of a fatal MCE Proposed Solution: Add a new string in the VMCOREINFO elf-note from within the MCE panic path to indicate cause of crash. 'makedumpfile' recognises this string to collect a slimdump instead of the normal dump. ii) Scenario2: System with PG_hwpoison (or landmine!) pages crashes because of a software bug. In this case, kexec kernel would normally reboot because of reading the PG_poison page. I'll soon get a new version of the patchset implementing this. Solution: Maintain a linked list of PFNs when the corresponding 'struct page' has been marked PG_hwpoison. We could export/put this list to use in quite a few ways. - Make it a policy in the kernel to not operate upon a 'read' request for such pages. Return '0' from copy_oldmem_page() function if the PFN is part of the PG_hwpoison list. I don't see a reason why anybody would be interested in reading the contents of a corrupt page, so making it standard kernel behaviour should be acceptable (or so I hope :-)). The list of PFNs must be exported (How? more on that below) to user-space, so that downstream tools such as 'crash' recognise that the vmcore (corresponding to PG_hwpoison memory regions) contains 'distorted' data. - Export the PG_hwpoison PFN list through a new elf-note. Given that the PFN list can be long and of indeterminate size (at compile time), I'm not sure if individually adding each PFN to the VMCOREINFO note would be a good idea and hence the new elf-note. Then teach 'makedumpfile' to recognise these PFNs (by exporting their VADDR or somesuch mechanism) and avoid reading those pages from /proc/vmcore. Also collect these PFNs and pass it down to 'crash' to help it identify the 'distorted' memory locations. The system in kexec-ed kernel could still crash because of fatal MCEs in its own memory region or new uncorrected memory errors in the old kernel's memory (error happened after the crash) and can be potentially 'read' during memory copy operation. However the probability of these occurrences is assumed to be small given the short lifetime of the kexec-ed kernel. While we don't actually need a new elf-note for i), I suspect it might not be the case for resolving ii). Kindly let me know your thoughts on this. Thanks, K.Prasad P.S.: A quick definition of terms used above ------------------------------------------- Fatal or unrecoverable MCE - A Machine Check Exception (MCE) that causes the system to panic. The exception might be triggered due to a faulty piece of memory in DIMM or cache. It is triggered due to 'consumption' (read/write) of a memory location with uncorrected memory error. PG_hwpoison - This is a page flag (marked in 'struct page') when an uncorrected memory error is detected (through means such as memory scrubbing) but is not 'consumed' yet. The page is flagged to prevent it from re-entering the memory stream. Causes the system to crash when the page with this flag is consumed.