On Fri, Feb 17, 2006 at 01:50:01PM -0800, Raj, Ashok wrote: > Which kernel version does this happen? Or doesn't it matter? This particular test was run on 2.6.16-r3-mm1, however I have seen this issue on other 2.6.16-r* kernels as well. Bryce > >-----Original Message----- > >From: hotplug_sig-bounces@xxxxxxxxxxxxxx > [mailto:hotplug_sig-bounces@xxxxxxxxxxxxxx] On Behalf Of > >Bryce Harrington > >Sent: Friday, February 17, 2006 1:39 PM > >To: hotplug_sig@xxxxxxxxxxxxxx > >Subject: [Hotplug_sig] Bug in CPU Hotplug on x86 > > > >Hi Martine and Mary, > > > >I've gotten the hotplug cpu test working and automated now on an x86 > >box. :-) > > > >I've also already found a bug in CPU hotplug on this platform. It > looks > >like a legitimate issue, that Hotplug SIG can report to the developers, > >but I wanted to run it by folks on this list first. Can you review > this > >and let me know if it should be reported? And who would be the best > >person to show this bug report to? > > > > > >This fault occurs on the first hotplug test. This test attempts to > >offline and then online each of the CPU's. It is failing when onlining > >the CPU that it just offlined; this results in a system lockup > >(requiring power cycling). > > > > > >Here is the output from hotplug01.sh: > > > > Name: hotplug01 > > Date: Wed Feb 15 12:19:16 PST 2006 > > Desc: What happens to disk controller interrupts when offlining > CPUs? > > > > CPU is 0 > > Starting loop '1' > > offlining cpu1: OK > > offlining cpu0: OK > > onlining cpu1: OK > > > >At this point the system locks up. > > > > > >During the test run, I'm seeing the following output from dmesg: > > > > Breaking affinity for irq 0 > > CPU 1 is now offline > > Booting processor 1/0 eip 2000 > > CPU 1 irqstacks, hard=c04e3000 soft=c04c3000 > > Initializing CPU#1 > > Calibrating delay using timer specific routine.. 1733.57 BogoMIPS > (lpj=3467154) > > CPU: After generic identify, caps: 0383fbff 00000000 00000000 00000000 > 00000000 00000000 00000000 > > CPU: After vendor identify, caps: 0383fbff 00000000 00000000 00000000 > 00000000 00000000 00000000 > > CPU: L1 I cache: 16K, L1 D cache: 16K > > CPU: L2 cache: 256K > > CPU: After all inits, caps: 0383fbff 00000000 00000000 00000040 > 00000000 00000000 00000000 > > Intel machine check architecture supported. > > Intel machine check reporting enabled on CPU#1. > > CPU1: Intel Pentium III (Coppermine) stepping 06 > > APIC error on CPU1: 00(40) > > > > > >This is what /var/log/messages shows: > > > >Feb 15 12:19:17 cl009 Breaking affinity for irq 0 > >Feb 15 12:19:17 cl009 CPU 1 is now offline > >Feb 15 12:19:19 cl009 Booting processor 1/0 eip 2000 > >Feb 15 12:19:19 cl009 CPU 1 irqstacks, hard=c04e3000 soft=c04c3000 > >Feb 15 12:19:19 cl009 Initializing CPU#1 > >Feb 15 12:19:19 cl009 Calibrating delay using timer specific routine.. > 1733.57 BogoMIPS (lpj=3467154) > >Feb 15 12:19:19 cl009 CPU: After generic identify, caps: 0383fbff > 00000000 00000000 00000000 00000000 > >00000000 00000000 > >Feb 15 12:19:19 cl009 CPU: After vendor identify, caps: 0383fbff > 00000000 00000000 00000000 00000000 > >00000000 00000000 > >Feb 15 12:19:19 cl009 CPU: L1 I cache: 16K, L1 D cache: 16K > >Feb 15 12:19:19 cl009 CPU: L2 cache: 256K > >Feb 15 12:19:19 cl009 CPU: After all inits, caps: 0383fbff 00000000 > 00000000 00000040 00000000 > >00000000 00000000 > >Feb 15 12:19:19 cl009 Intel machine check architecture supported. > >Feb 15 12:19:19 cl009 Intel machine check reporting enabled on CPU#1. > >Feb 15 12:19:19 cl009 CPU1: Intel Pentium III (Coppermine) stepping 06 > >Feb 15 12:19:19 cl009 APIC error on CPU1: 00(40) > > > > > >I've been able to reproduce this error 3 out of 3 times on this > >particular system. It is a Pentium III with the following > >/proc/cpuinfo: > > > > processor : 0 > > vendor_id : GenuineIntel > > cpu family : 6 > > model : 8 > > model name : Pentium III (Coppermine) > > stepping : 6 > > cpu MHz : 866.932 > > cache size : 256 KB > > fdiv_bug : no > > hlt_bug : no > > f00f_bug : no > > coma_bug : no > > fpu : yes > > fpu_exception : yes > > cpuid level : 2 > > wp : yes > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > mca cmov pat pse36 mmx fxsr > >sse > > bogomips : 1736.35 > > > > processor : 1 > > vendor_id : GenuineIntel > > cpu family : 6 > > model : 8 > > model name : Pentium III (Coppermine) > > stepping : 6 > > cpu MHz : 866.932 > > cache size : 256 KB > > fdiv_bug : no > > hlt_bug : no > > f00f_bug : no > > coma_bug : no > > fpu : yes > > fpu_exception : yes > > cpuid level : 2 > > wp : yes > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > mca cmov pat pse36 mmx fxsr > >sse > > bogomips : 1733.57 > > > > > >Bryce