On Thu, 5 Oct 2006 18:45:55 -0700 Bryce Harrington wrote: > On recent -git kernels, I'm getting an Oops when running the > lhcs_regression test suite. I can replicate the issue with the > following steps: > > amd01 ~ # uname -a > Linux amd01 2.6.18-git22 #1 SMP PREEMPT Thu Oct 5 16:41:35 GMT 2006 > x86_64 AMD Opteron(tm) Processor 242 GNU/Linux > amd01 ~ # cat /proc/cpuinfo | grep processor > processor : 0 > processor : 1 > amd01 ~ # find /sys/devices/system/cpu/cpu*/online > /sys/devices/system/cpu/cpu1/online > amd01 ~ # echo 0 > /sys/devices/system/cpu/cpu1/online > amd01 ~ # cat /proc/cpuinfo | grep processor > processor : 0 > amd01 ~ # echo 1 > /sys/devices/system/cpu/cpu1/online > amd01 ~ # cat /proc/cpuinfo | grep processor > processor : 0 > processor : 1 > amd01 ~ # echo 0 > /sys/devices/system/cpu/cpu1/online > > Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: > [<ffffffff80255287>] __drain_pages+0x29/0x5f > PGD 7e56d067 PUD 7ee80067 PMD 0 > Oops: 0000 [1] PREEMPT SMP > CPU 0 > Modules linked in: > Pid: 7203, comm: bash Tainted: G M 2.6.18-git22 #1 Hi Bryce, 'M' says that the machine has experienced an abnormal "machine check", like a processor or memory fault. Do you have a log that shows that? The message log (URL) below isn't for this same kernel oops (maybe same oops, but it's 2.6.19-rc1 kernel). Bug reports should use same info. > RIP: 0010:[<ffffffff80255287>] [<ffffffff80255287>] __drain_pages+0x29/0x5f > RSP: 0018:ffff81003f1b3dd8 EFLAGS: 00010082 > RAX: 0000000000000001 RBX: 0000000000000082 RCX: 0000000000000000 > RDX: ffff81000000c580 RSI: 00000000fffffffe RDI: ffff81000000c000 > RBP: 0000000000000000 R08: 00000000fffffffe R09: ffff81007f1e63f0 > R10: 0000000000000000 R11: 0000000000000000 R12: ffff81000000c580 > R13: 0000000000000001 R14: 0000000000000001 R15: ffff81003f1b3f50 > FS: 00002ab3e8a136d0(0000) GS:ffffffff806e3000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 000000007eea1000 CR4: 00000000000006e0 > Process bash (pid: 7203, threadinfo ffff81003f1b2000, task ffff81003f1c07c0) > Stack: 0000000000000001 0000000000000001 0000000000000007 0000000000000007 > 0000000000000003 ffffffff802564b6 ffffffff8060d320 ffffffff805644a7 > ffffffff8060dbe0 0000000000000001 0000000000000001 ffffffff8023c49d > Call Trace: > [<ffffffff802564b6>] page_alloc_cpu_notify+0x12/0x28 > [<ffffffff805644a7>] notifier_call_chain+0x23/0x32 > [<ffffffff8023c49d>] blocking_notifier_call_chain+0x22/0x36 > [<ffffffff80248ff9>] _cpu_down+0x17f/0x23d > [<ffffffff802490de>] cpu_down+0x27/0x3c > [<ffffffff804419c7>] store_online+0x0/0x6b > [<ffffffff804419ec>] store_online+0x25/0x6b > [<ffffffff802ac4b7>] sysfs_write_file+0xad/0xd7 > [<ffffffff80273ff7>] vfs_write+0xaf/0x14e > [<ffffffff80274149>] sys_write+0x45/0x6e > [<ffffffff8020965e>] system_call+0x7e/0x83 > > > Code: 8b 75 00 48 8d 55 10 31 c9 4c 89 e7 e8 32 fa ff ff c7 45 00 > RIP [<ffffffff80255287>] __drain_pages+0x29/0x5f > RSP <ffff81003f1b3dd8> > CR2: 0000000000000000 > > > I am also seeing this issue on linux-2.6.19-rc1. > > The test output that identified the presence of the bug: > http://crucible.osdl.org/runs/2416/test_output/lhcs_regression.log > > Some info about the amd64 system is available here: > http://crucible.osdl.org/runs/2416/sysinfo/amd01.console > http://crucible.osdl.org/runs/2416/sysinfo/amd01.messages > http://crucible.osdl.org/runs/2416/sysinfo/amd01.1/proc/ > http://crucible.osdl.org/runs/2416/sysinfo/amd01.1/INFO/ > http://crucible.osdl.org/runs/2416/sysinfo/amd01.1/etc/ > > I do not see this issue on the x86, ia64, or x86_64 Xeon systems. > I can provide additional detail about these systems if needed. --- ~Randy