On 01/24/2017 at 09:46 AM, Xunlei Pang wrote: > On 01/24/2017 at 01:51 AM, Borislav Petkov wrote: >> Hey Tony, >> >> a "welcome back" is in order? :-) >> >> On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote: >>> If the system had experienced some memory corruption, but >>> recovered ... then there would be some pages sitting around >>> that the old kernel had marked as POISON and stopped using. >>> The kexec'd kernel doesn't know about these, so may touch that >>> memory while taking a crash dump ... >> Hmm, pass a list of poisoned pages to the kdump kernel so as not to >> touch. Looks like there's already functionality for that: >> >> "makedumpfile can exclude the following types of pages while copying >> VMCORE to DUMPFILE, and a user can choose which type of pages will be >> excluded. >> >> - Pages filled with zero >> - Cache pages >> - User process data pages >> - Free pages" >> >> (there is a makedumpfile manpage somewhere) >> >> And apparently crash knows about poisoned pages and handles them: >> >> static int __init crash_save_vmcoreinfo_init(void) >> { >> ... >> #ifdef CONFIG_MEMORY_FAILURE >> VMCOREINFO_NUMBER(PG_hwpoison); >> #endif >> >> so if that works, the kexeced kernel should know about that list. > From the log in my previous reply, MCE occurred before makedumpfile dumping, > so I guess if the poisoned ones belong to the crash reserved memory or other > type of events? Another possibility may be from any system.reserved/pcie memory which are shared between 1st and 2nd kernel. > > Besides, some kdump kernel may not use makedumpfile, for example a simple "cp" > is also allowed to process "/proc/vmcore". > >>> and then you have a broadcast machine check (on older[1] Intel CPUs >>> that don't support local machine check). >> Right. >> >>> This is hard to work around. You really need all the CPUs to have set >>> CR4.MCE=1 (if any didn't, then they will force a reset when they see >>> the machine check). Also you need to make sure that they jump to the >>> copy of do_machine_check() in the new kernel, not the old kernel. >> Doesn't matter, right? The new copy is as clueless as the old one about >> those MCEs. >> > It's the code in mce_start(), it waits for all the online cpus including the cpus > that kdump boots on to synchronize. > > So for new mce handler of kdump kernel, it is fine as the number of online cpus > is correct; as for old mce handler of 1st kernel, it's not true because some cpus > which are regarded online from 1st kernel's view are running the 2nd kernel now, > they can't respond to the old mce handler which will timeout the old mce handler. > > Regards, > Xunlei