[Hotplug_sig] Bug in CPU Hotplug on x86

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Martine and Mary,

I've gotten the hotplug cpu test working and automated now on an x86
box.  :-)

I've also already found a bug in CPU hotplug on this platform.  It looks
like a legitimate issue, that Hotplug SIG can report to the developers,
but I wanted to run it by folks on this list first.  Can you review this
and let me know if it should be reported?  And who would be the best
person to show this bug report to?


This fault occurs on the first hotplug test.  This test attempts to
offline and then online each of the CPU's.  It is failing when onlining
the CPU that it just offlined; this results in a system lockup
(requiring power cycling).


Here is the output from hotplug01.sh:

  Name:   hotplug01
  Date:   Wed Feb 15 12:19:16 PST 2006
  Desc:   What happens to disk controller interrupts when offlining CPUs?

  CPU is 0
  Starting loop '1'
  offlining cpu1:  OK
  offlining cpu0:  OK
  onlining cpu1:  OK

At this point the system locks up.


During the test run, I'm seeing the following output from dmesg:

 Breaking affinity for irq 0
 CPU 1 is now offline
 Booting processor 1/0 eip 2000
 CPU 1 irqstacks, hard=c04e3000 soft=c04c3000
 Initializing CPU#1
 Calibrating delay using timer specific routine.. 1733.57 BogoMIPS (lpj=3467154)
 CPU: After generic identify, caps: 0383fbff 00000000 00000000 00000000 00000000 00000000 00000000
 CPU: After vendor identify, caps: 0383fbff 00000000 00000000 00000000 00000000 00000000 00000000
 CPU: L1 I cache: 16K, L1 D cache: 16K                                                           
 CPU: L2 cache: 256K
 CPU: After all inits, caps: 0383fbff 00000000 00000000 00000040 00000000 00000000 00000000
 Intel machine check architecture supported.
 Intel machine check reporting enabled on CPU#1.
 CPU1: Intel Pentium III (Coppermine) stepping 06
 APIC error on CPU1: 00(40)


This is what /var/log/messages shows:

Feb 15 12:19:17 cl009 Breaking affinity for irq 0
Feb 15 12:19:17 cl009 CPU 1 is now offline
Feb 15 12:19:19 cl009 Booting processor 1/0 eip 2000
Feb 15 12:19:19 cl009 CPU 1 irqstacks, hard=c04e3000 soft=c04c3000
Feb 15 12:19:19 cl009 Initializing CPU#1
Feb 15 12:19:19 cl009 Calibrating delay using timer specific routine.. 1733.57 BogoMIPS (lpj=3467154)
Feb 15 12:19:19 cl009 CPU: After generic identify, caps: 0383fbff 00000000 00000000 00000000 00000000 00000000 00000000
Feb 15 12:19:19 cl009 CPU: After vendor identify, caps: 0383fbff 00000000 00000000 00000000 00000000 00000000 00000000
Feb 15 12:19:19 cl009 CPU: L1 I cache: 16K, L1 D cache: 16K
Feb 15 12:19:19 cl009 CPU: L2 cache: 256K
Feb 15 12:19:19 cl009 CPU: After all inits, caps: 0383fbff 00000000 00000000 00000040 00000000 00000000 00000000
Feb 15 12:19:19 cl009 Intel machine check architecture supported.
Feb 15 12:19:19 cl009 Intel machine check reporting enabled on CPU#1.
Feb 15 12:19:19 cl009 CPU1: Intel Pentium III (Coppermine) stepping 06
Feb 15 12:19:19 cl009 APIC error on CPU1: 00(40)


I've been able to reproduce this error 3 out of 3 times on this
particular system.  It is a Pentium III with the following
/proc/cpuinfo:

 processor       : 0
 vendor_id       : GenuineIntel
 cpu family      : 6
 model           : 8
 model name      : Pentium III (Coppermine)
 stepping        : 6
 cpu MHz         : 866.932
 cache size      : 256 KB
 fdiv_bug        : no
 hlt_bug         : no
 f00f_bug        : no
 coma_bug        : no
 fpu             : yes
 fpu_exception   : yes
 cpuid level     : 2
 wp              : yes
 flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
 bogomips        : 1736.35

 processor       : 1
 vendor_id       : GenuineIntel
 cpu family      : 6
 model           : 8
 model name      : Pentium III (Coppermine)
 stepping        : 6
 cpu MHz         : 866.932
 cache size      : 256 KB
 fdiv_bug        : no
 hlt_bug         : no
 f00f_bug        : no
 coma_bug        : no
 fpu             : yes
 fpu_exception   : yes
 cpuid level     : 2
 wp              : yes
 flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
 bogomips        : 1733.57


Bryce

[Index of Archives]     [Linux Kernel]     [Linux DVB]     [Asterisk Internet PBX]     [DCCP]     [Netdev]     [X.org]     [Util Linux NG]     [Fedora Women]     [ALSA Devel]     [Linux USB]

  Powered by Linux