On Mon, Oct 09, 2006 at 02:40:24PM -0700, Randy Dunlap wrote: > On Sat, 7 Oct 2006 21:57:49 +0000 Pavel Machek wrote: > > > Hi! > > > > > 1. Oops offlining cpu twice on AMD64 (but not on EM64t) > > > with the 2.6.18-git22 kernel > > > > > > Reported to hotplug lists 10/05: > > > http://lists.osdl.org/pipermail/hotplug_sig/2006-October/000680.html > > > > > > To recreate: offline, online, and then offline a CPU, then oopses > > > http://crucible.osdl.org/runs/2397/sysinfo/amd01.console > > > http://crucible.osdl.org/runs/2397/sysinfo/amd01.2/proc/config > > > > > > Here's a snippet of the oops: > > > > > > # echo 0 > /sys/devices/system/cpu/cpu1/online > > > > > > Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: > > > [<ffffffff80255287>] __drain_pages+0x29/0x5f > > > PGD 7e56d067 PUD 7ee80067 PMD 0 > > > Oops: 0000 [1] PREEMPT SMP > > > CPU 0 > > > Modules linked in: > > > Pid: 7203, comm: bash Tainted: G M 2.6.18-git22 #1 > > ~~~~~ > > kernel is unhappy here. Forced module unload? > > Machine check exception. 'G' is Good, same place where 'P' > for proprietary would be. But yes, kernel or machine is unhappy. To followup on this issue... I found a BIOS update for the motherboard of this machine indicating it includes a fix for MCE during hibernate operations; my guess is that cpu hotplug may be triggering this bug. Meanwhile, we checked against a couple other different AMD64 systems; these are behaving correctly. Anyway, thanks for the pointers, it sounds like this is probably just a hardware issue. I'll report back if I find differently. Bryce