(2013/11/07 4:02), jerry.hoemann at hp.com wrote: > On Wed, Oct 23, 2013 at 12:01:18AM +0900, HATAYAMA Daisuke wrote: >> This patch set is to allow kdump 2nd kernel to wake up multiple CPUs >> even if 1st kernel crashs on some AP, a continueing work from: >> >> [PATCH v3 0/2] x86, apic, kdump: Disable BSP if boot cpu is AP >> https://lkml.org/lkml/2013/10/16/300. >> >> In this version, basic design has changed. Now users need to figure >> out initial APIC ID of BSP in the 1st kernel and configures kernel >> parameter for the 2nd kernel manually using disable_cpu_apic kernel >> parameter to be newly introduced in this patch set. This design is >> more flexible than the previous version in that we no longer have to >> rely on ACPI/MP table to get initial APIC ID of BSP. >> >> Sorry, this patch set have not include in-source documentation >> requested by Borislav Petkov yet, but I'll post it later separately, >> which would be better to focus on documentation reviewing. >> >> ChangeLog >> >> v3 => v4) >> >> - Rebased on top of v3.12-rc6 >> >> - Basic design has been changed. Now users need to figure out initial >> APIC ID of BSP in the 1st kernel and configures kernel parameter for >> the 2nd kernel manually using disable_cpu_apic kernel parameter to >> be newly introduced in this patch set. This design is more flexible >> than the previous version in that we no longer have to rely on >> ACPI/MP table to get initial APIC ID of BSP. >> > > > Daisuke, > > I have back ported version 4 of this patch to both a 2.6.32 and 3.0.80 > based kernels and distros and tested on a prototype system. I have > previously test version 1 & 3 as well.) > > The systems are configured to boot the capture kernel 8-way parallel. > However, I am running makedumpfile single threaded. > > Panic is induced via "echo c > /proc/sysrq-trigger". This is done > under various system loads and on random cpus. I have done over a > thousand dumps total during this testing. > Thanks for your testing. > I have seen no issues w/ the 3.0.80 dump testing on our proto. > > On the 2.6.32 testing on our proto, i have hit a low probability (< 5%) > chance of the capture suffering a soft lockup hang during > "Switching to clocksource hpet." I have not RCA'd this yet. > Note, I have seen this issue on earlier version of the patch, so > it is not specific to this version. > > I then tested the 2.6.32 port on a dl380. This worked without issue. > > Note, I have seen no issues related to this patch on our proto when > booting the capture with a single processor. > > While I am still pursuing the issue of the 2.6.32 kernel on our proto, > I believe this patch is good and should be accepted. > This seems there's something that depends on the system you used. But I have never verified my patch set on 2.6.32-based kernel. I'll try to do a similar test on some FJ systems. The 2.6.32-based kernel you mean is one of the Longterm release kernels, right? So, you used on the test the 2.6.32-based Longterm release kernel with my v4 patch, right? The root cause seems to have already been fixed on recent kernel since you didn't see the bug on 3.0.80-based kernel, so I think binary search would be useful. -- Thanks. HATAYAMA, Daisuke