[Hotplug_sig] Bug in CPU Hotplug on x86

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 17, 2006 at 04:47:51PM -0500, Silbermann, Martine wrote:
> 
> hi Bryce,
> 
> I'm confused, you have a 2 CPU system and you're trying to offline cpu0
> after offlining cpu1, on a 2 CPU system this should NOT be allowed for
> obvious reasons :-)
> So why does the test reply "OK" to offlining cpuO? 

Yes, it should be reporting error; obviously it's not offlining cpu0 or
else the machine would stop at that point.  I'll clarify this error
reporting in the next release of the testsuite - the code is just
resetting the error value before it gets a chance to report it.
 
> Martine
> 
> -----Original Message-----
> From: hotplug_sig-bounces@xxxxxxxxxxxxxx
> [mailto:hotplug_sig-bounces@xxxxxxxxxxxxxx] On Behalf Of Bryce
> Harrington
> Sent: Friday, February 17, 2006 4:39 PM
> To: hotplug_sig@xxxxxxxxxxxxxx
> Subject: [Hotplug_sig] Bug in CPU Hotplug on x86
> 
> 
> Hi Martine and Mary,
> 
> I've gotten the hotplug cpu test working and automated now on an x86
> box.  :-)
> 
> I've also already found a bug in CPU hotplug on this platform.  It looks
> like a legitimate issue, that Hotplug SIG can report to the developers,
> but I wanted to run it by folks on this list first.  Can you review this
> and let me know if it should be reported?  And who would be the best
> person to show this bug report to?
> 
> 
> This fault occurs on the first hotplug test.  This test attempts to
> offline and then online each of the CPU's.  It is failing when onlining
> the CPU that it just offlined; this results in a system lockup
> (requiring power cycling).
> 
> 
> Here is the output from hotplug01.sh:
> 
>   Name:   hotplug01
>   Date:   Wed Feb 15 12:19:16 PST 2006
>   Desc:   What happens to disk controller interrupts when offlining
> CPUs?
> 
>   CPU is 0
>   Starting loop '1'
>   offlining cpu1:  OK
>   offlining cpu0:  OK
>   onlining cpu1:  OK
> 
> At this point the system locks up.
> 
> 
> During the test run, I'm seeing the following output from dmesg:
> 
>  Breaking affinity for irq 0
>  CPU 1 is now offline
>  Booting processor 1/0 eip 2000
>  CPU 1 irqstacks, hard=c04e3000 soft=c04c3000
>  Initializing CPU#1
>  Calibrating delay using timer specific routine.. 1733.57 BogoMIPS
> (lpj=3467154)
>  CPU: After generic identify, caps: 0383fbff 00000000 00000000 00000000
> 00000000 00000000 00000000
>  CPU: After vendor identify, caps: 0383fbff 00000000 00000000 00000000
> 00000000 00000000 00000000
>  CPU: L1 I cache: 16K, L1 D cache: 16K
> 
>  CPU: L2 cache: 256K
>  CPU: After all inits, caps: 0383fbff 00000000 00000000 00000040
> 00000000 00000000 00000000  Intel machine check architecture supported.
> Intel machine check reporting enabled on CPU#1.
>  CPU1: Intel Pentium III (Coppermine) stepping 06
>  APIC error on CPU1: 00(40)
> 
> 
> This is what /var/log/messages shows:
> 
> Feb 15 12:19:17 cl009 Breaking affinity for irq 0
> Feb 15 12:19:17 cl009 CPU 1 is now offline
> Feb 15 12:19:19 cl009 Booting processor 1/0 eip 2000
> Feb 15 12:19:19 cl009 CPU 1 irqstacks, hard=c04e3000 soft=c04c3000 Feb
> 15 12:19:19 cl009 Initializing CPU#1 Feb 15 12:19:19 cl009 Calibrating
> delay using timer specific routine.. 1733.57 BogoMIPS (lpj=3467154) Feb
> 15 12:19:19 cl009 CPU: After generic identify, caps: 0383fbff 00000000
> 00000000 00000000 00000000 00000000 00000000 Feb 15 12:19:19 cl009 CPU:
> After vendor identify, caps: 0383fbff 00000000 00000000 00000000
> 00000000 00000000 00000000 Feb 15 12:19:19 cl009 CPU: L1 I cache: 16K,
> L1 D cache: 16K Feb 15 12:19:19 cl009 CPU: L2 cache: 256K Feb 15
> 12:19:19 cl009 CPU: After all inits, caps: 0383fbff 00000000 00000000
> 00000040 00000000 00000000 00000000 Feb 15 12:19:19 cl009 Intel machine
> check architecture supported. Feb 15 12:19:19 cl009 Intel machine check
> reporting enabled on CPU#1. Feb 15 12:19:19 cl009 CPU1: Intel Pentium
> III (Coppermine) stepping 06 Feb 15 12:19:19 cl009 APIC error on CPU1:
> 00(40)
> 
> 
> I've been able to reproduce this error 3 out of 3 times on this
> particular system.  It is a Pentium III with the following
> /proc/cpuinfo:
> 
>  processor       : 0
>  vendor_id       : GenuineIntel
>  cpu family      : 6
>  model           : 8
>  model name      : Pentium III (Coppermine)
>  stepping        : 6
>  cpu MHz         : 866.932
>  cache size      : 256 KB
>  fdiv_bug        : no
>  hlt_bug         : no
>  f00f_bug        : no
>  coma_bug        : no
>  fpu             : yes
>  fpu_exception   : yes
>  cpuid level     : 2
>  wp              : yes
>  flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 mmx fxsr sse
>  bogomips        : 1736.35
> 
>  processor       : 1
>  vendor_id       : GenuineIntel
>  cpu family      : 6
>  model           : 8
>  model name      : Pentium III (Coppermine)
>  stepping        : 6
>  cpu MHz         : 866.932
>  cache size      : 256 KB
>  fdiv_bug        : no
>  hlt_bug         : no
>  f00f_bug        : no
>  coma_bug        : no
>  fpu             : yes
>  fpu_exception   : yes
>  cpuid level     : 2
>  wp              : yes
>  flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 mmx fxsr sse
>  bogomips        : 1733.57
> 
> 
> Bryce

[Index of Archives]     [Linux Kernel]     [Linux DVB]     [Asterisk Internet PBX]     [DCCP]     [Netdev]     [X.org]     [Util Linux NG]     [Fedora Women]     [ALSA Devel]     [Linux USB]

  Powered by Linux