[Hotplug_sig] Bug in CPU Hotplug on x86

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi Bryce,

I'm confused, you have a 2 CPU system and you're trying to offline cpu0
after offlining cpu1, on a 2 CPU system this should NOT be allowed for
obvious reasons :-)
So why does the test reply "OK" to offlining cpuO? 

Martine

-----Original Message-----
From: hotplug_sig-bounces@xxxxxxxxxxxxxx
[mailto:hotplug_sig-bounces@xxxxxxxxxxxxxx] On Behalf Of Bryce
Harrington
Sent: Friday, February 17, 2006 4:39 PM
To: hotplug_sig@xxxxxxxxxxxxxx
Subject: [Hotplug_sig] Bug in CPU Hotplug on x86


Hi Martine and Mary,

I've gotten the hotplug cpu test working and automated now on an x86
box.  :-)

I've also already found a bug in CPU hotplug on this platform.  It looks
like a legitimate issue, that Hotplug SIG can report to the developers,
but I wanted to run it by folks on this list first.  Can you review this
and let me know if it should be reported?  And who would be the best
person to show this bug report to?


This fault occurs on the first hotplug test.  This test attempts to
offline and then online each of the CPU's.  It is failing when onlining
the CPU that it just offlined; this results in a system lockup
(requiring power cycling).


Here is the output from hotplug01.sh:

  Name:   hotplug01
  Date:   Wed Feb 15 12:19:16 PST 2006
  Desc:   What happens to disk controller interrupts when offlining
CPUs?

  CPU is 0
  Starting loop '1'
  offlining cpu1:  OK
  offlining cpu0:  OK
  onlining cpu1:  OK

At this point the system locks up.


During the test run, I'm seeing the following output from dmesg:

 Breaking affinity for irq 0
 CPU 1 is now offline
 Booting processor 1/0 eip 2000
 CPU 1 irqstacks, hard=c04e3000 soft=c04c3000
 Initializing CPU#1
 Calibrating delay using timer specific routine.. 1733.57 BogoMIPS
(lpj=3467154)
 CPU: After generic identify, caps: 0383fbff 00000000 00000000 00000000
00000000 00000000 00000000
 CPU: After vendor identify, caps: 0383fbff 00000000 00000000 00000000
00000000 00000000 00000000
 CPU: L1 I cache: 16K, L1 D cache: 16K

 CPU: L2 cache: 256K
 CPU: After all inits, caps: 0383fbff 00000000 00000000 00000040
00000000 00000000 00000000  Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
 CPU1: Intel Pentium III (Coppermine) stepping 06
 APIC error on CPU1: 00(40)


This is what /var/log/messages shows:

Feb 15 12:19:17 cl009 Breaking affinity for irq 0
Feb 15 12:19:17 cl009 CPU 1 is now offline
Feb 15 12:19:19 cl009 Booting processor 1/0 eip 2000
Feb 15 12:19:19 cl009 CPU 1 irqstacks, hard=c04e3000 soft=c04c3000 Feb
15 12:19:19 cl009 Initializing CPU#1 Feb 15 12:19:19 cl009 Calibrating
delay using timer specific routine.. 1733.57 BogoMIPS (lpj=3467154) Feb
15 12:19:19 cl009 CPU: After generic identify, caps: 0383fbff 00000000
00000000 00000000 00000000 00000000 00000000 Feb 15 12:19:19 cl009 CPU:
After vendor identify, caps: 0383fbff 00000000 00000000 00000000
00000000 00000000 00000000 Feb 15 12:19:19 cl009 CPU: L1 I cache: 16K,
L1 D cache: 16K Feb 15 12:19:19 cl009 CPU: L2 cache: 256K Feb 15
12:19:19 cl009 CPU: After all inits, caps: 0383fbff 00000000 00000000
00000040 00000000 00000000 00000000 Feb 15 12:19:19 cl009 Intel machine
check architecture supported. Feb 15 12:19:19 cl009 Intel machine check
reporting enabled on CPU#1. Feb 15 12:19:19 cl009 CPU1: Intel Pentium
III (Coppermine) stepping 06 Feb 15 12:19:19 cl009 APIC error on CPU1:
00(40)


I've been able to reproduce this error 3 out of 3 times on this
particular system.  It is a Pentium III with the following
/proc/cpuinfo:

 processor       : 0
 vendor_id       : GenuineIntel
 cpu family      : 6
 model           : 8
 model name      : Pentium III (Coppermine)
 stepping        : 6
 cpu MHz         : 866.932
 cache size      : 256 KB
 fdiv_bug        : no
 hlt_bug         : no
 f00f_bug        : no
 coma_bug        : no
 fpu             : yes
 fpu_exception   : yes
 cpuid level     : 2
 wp              : yes
 flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse
 bogomips        : 1736.35

 processor       : 1
 vendor_id       : GenuineIntel
 cpu family      : 6
 model           : 8
 model name      : Pentium III (Coppermine)
 stepping        : 6
 cpu MHz         : 866.932
 cache size      : 256 KB
 fdiv_bug        : no
 hlt_bug         : no
 f00f_bug        : no
 coma_bug        : no
 fpu             : yes
 fpu_exception   : yes
 cpuid level     : 2
 wp              : yes
 flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse
 bogomips        : 1733.57


Bryce


[Index of Archives]     [Linux Kernel]     [Linux DVB]     [Asterisk Internet PBX]     [DCCP]     [Netdev]     [X.org]     [Util Linux NG]     [Fedora Women]     [ALSA Devel]     [Linux USB]

  Powered by Linux