On Fri, Feb 17, 2006 at 04:47:51PM -0500, Silbermann, Martine wrote: > > hi Bryce, > > I'm confused, you have a 2 CPU system and you're trying to offline cpu0 > after offlining cpu1, on a 2 CPU system this should NOT be allowed for > obvious reasons :-) > So why does the test reply "OK" to offlining cpuO? Yes, it should be reporting error; obviously it's not offlining cpu0 or else the machine would stop at that point. I'll clarify this error reporting in the next release of the testsuite - the code is just resetting the error value before it gets a chance to report it. > Martine > > -----Original Message----- > From: hotplug_sig-bounces@xxxxxxxxxxxxxx > [mailto:hotplug_sig-bounces@xxxxxxxxxxxxxx] On Behalf Of Bryce > Harrington > Sent: Friday, February 17, 2006 4:39 PM > To: hotplug_sig@xxxxxxxxxxxxxx > Subject: [Hotplug_sig] Bug in CPU Hotplug on x86 > > > Hi Martine and Mary, > > I've gotten the hotplug cpu test working and automated now on an x86 > box. :-) > > I've also already found a bug in CPU hotplug on this platform. It looks > like a legitimate issue, that Hotplug SIG can report to the developers, > but I wanted to run it by folks on this list first. Can you review this > and let me know if it should be reported? And who would be the best > person to show this bug report to? > > > This fault occurs on the first hotplug test. This test attempts to > offline and then online each of the CPU's. It is failing when onlining > the CPU that it just offlined; this results in a system lockup > (requiring power cycling). > > > Here is the output from hotplug01.sh: > > Name: hotplug01 > Date: Wed Feb 15 12:19:16 PST 2006 > Desc: What happens to disk controller interrupts when offlining > CPUs? > > CPU is 0 > Starting loop '1' > offlining cpu1: OK > offlining cpu0: OK > onlining cpu1: OK > > At this point the system locks up. > > > During the test run, I'm seeing the following output from dmesg: > > Breaking affinity for irq 0 > CPU 1 is now offline > Booting processor 1/0 eip 2000 > CPU 1 irqstacks, hard=c04e3000 soft=c04c3000 > Initializing CPU#1 > Calibrating delay using timer specific routine.. 1733.57 BogoMIPS > (lpj=3467154) > CPU: After generic identify, caps: 0383fbff 00000000 00000000 00000000 > 00000000 00000000 00000000 > CPU: After vendor identify, caps: 0383fbff 00000000 00000000 00000000 > 00000000 00000000 00000000 > CPU: L1 I cache: 16K, L1 D cache: 16K > > CPU: L2 cache: 256K > CPU: After all inits, caps: 0383fbff 00000000 00000000 00000040 > 00000000 00000000 00000000 Intel machine check architecture supported. > Intel machine check reporting enabled on CPU#1. > CPU1: Intel Pentium III (Coppermine) stepping 06 > APIC error on CPU1: 00(40) > > > This is what /var/log/messages shows: > > Feb 15 12:19:17 cl009 Breaking affinity for irq 0 > Feb 15 12:19:17 cl009 CPU 1 is now offline > Feb 15 12:19:19 cl009 Booting processor 1/0 eip 2000 > Feb 15 12:19:19 cl009 CPU 1 irqstacks, hard=c04e3000 soft=c04c3000 Feb > 15 12:19:19 cl009 Initializing CPU#1 Feb 15 12:19:19 cl009 Calibrating > delay using timer specific routine.. 1733.57 BogoMIPS (lpj=3467154) Feb > 15 12:19:19 cl009 CPU: After generic identify, caps: 0383fbff 00000000 > 00000000 00000000 00000000 00000000 00000000 Feb 15 12:19:19 cl009 CPU: > After vendor identify, caps: 0383fbff 00000000 00000000 00000000 > 00000000 00000000 00000000 Feb 15 12:19:19 cl009 CPU: L1 I cache: 16K, > L1 D cache: 16K Feb 15 12:19:19 cl009 CPU: L2 cache: 256K Feb 15 > 12:19:19 cl009 CPU: After all inits, caps: 0383fbff 00000000 00000000 > 00000040 00000000 00000000 00000000 Feb 15 12:19:19 cl009 Intel machine > check architecture supported. Feb 15 12:19:19 cl009 Intel machine check > reporting enabled on CPU#1. Feb 15 12:19:19 cl009 CPU1: Intel Pentium > III (Coppermine) stepping 06 Feb 15 12:19:19 cl009 APIC error on CPU1: > 00(40) > > > I've been able to reproduce this error 3 out of 3 times on this > particular system. It is a Pentium III with the following > /proc/cpuinfo: > > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 8 > model name : Pentium III (Coppermine) > stepping : 6 > cpu MHz : 866.932 > cache size : 256 KB > fdiv_bug : no > hlt_bug : no > f00f_bug : no > coma_bug : no > fpu : yes > fpu_exception : yes > cpuid level : 2 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > mca cmov pat pse36 mmx fxsr sse > bogomips : 1736.35 > > processor : 1 > vendor_id : GenuineIntel > cpu family : 6 > model : 8 > model name : Pentium III (Coppermine) > stepping : 6 > cpu MHz : 866.932 > cache size : 256 KB > fdiv_bug : no > hlt_bug : no > f00f_bug : no > coma_bug : no > fpu : yes > fpu_exception : yes > cpuid level : 2 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > mca cmov pat pse36 mmx fxsr sse > bogomips : 1733.57 > > > Bryce