[Hotplug_sig] [Oops] NULL ptr deref on AMD64 when offlining cpu twice

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On recent -git kernels, I'm getting an Oops when running the
lhcs_regression test suite.  I can replicate the issue with the
following steps:

amd01 ~ # uname -a
Linux amd01 2.6.18-git22 #1 SMP PREEMPT Thu Oct 5 16:41:35 GMT 2006
x86_64 AMD Opteron(tm) Processor 242 GNU/Linux
amd01 ~ # cat /proc/cpuinfo | grep processor
processor       : 0
processor       : 1
amd01 ~ # find /sys/devices/system/cpu/cpu*/online
/sys/devices/system/cpu/cpu1/online
amd01 ~ # echo 0 > /sys/devices/system/cpu/cpu1/online
amd01 ~ # cat /proc/cpuinfo | grep processor
processor       : 0
amd01 ~ # echo 1 > /sys/devices/system/cpu/cpu1/online
amd01 ~ # cat /proc/cpuinfo | grep processor
processor       : 0
processor       : 1
amd01 ~ # echo 0 > /sys/devices/system/cpu/cpu1/online

 Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
 [<ffffffff80255287>] __drain_pages+0x29/0x5f
PGD 7e56d067 PUD 7ee80067 PMD 0
Oops: 0000 [1] PREEMPT SMP
CPU 0
Modules linked in:
Pid: 7203, comm: bash Tainted: G   M  2.6.18-git22 #1
RIP: 0010:[<ffffffff80255287>]  [<ffffffff80255287>] __drain_pages+0x29/0x5f
RSP: 0018:ffff81003f1b3dd8  EFLAGS: 00010082
RAX: 0000000000000001 RBX: 0000000000000082 RCX: 0000000000000000
RDX: ffff81000000c580 RSI: 00000000fffffffe RDI: ffff81000000c000
RBP: 0000000000000000 R08: 00000000fffffffe R09: ffff81007f1e63f0
R10: 0000000000000000 R11: 0000000000000000 R12: ffff81000000c580
R13: 0000000000000001 R14: 0000000000000001 R15: ffff81003f1b3f50
FS:  00002ab3e8a136d0(0000) GS:ffffffff806e3000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000007eea1000 CR4: 00000000000006e0
Process bash (pid: 7203, threadinfo ffff81003f1b2000, task ffff81003f1c07c0)
Stack:  0000000000000001 0000000000000001 0000000000000007 0000000000000007
 0000000000000003 ffffffff802564b6 ffffffff8060d320 ffffffff805644a7
 ffffffff8060dbe0 0000000000000001 0000000000000001 ffffffff8023c49d
Call Trace:
 [<ffffffff802564b6>] page_alloc_cpu_notify+0x12/0x28
 [<ffffffff805644a7>] notifier_call_chain+0x23/0x32
 [<ffffffff8023c49d>] blocking_notifier_call_chain+0x22/0x36
 [<ffffffff80248ff9>] _cpu_down+0x17f/0x23d
 [<ffffffff802490de>] cpu_down+0x27/0x3c
 [<ffffffff804419c7>] store_online+0x0/0x6b
 [<ffffffff804419ec>] store_online+0x25/0x6b
 [<ffffffff802ac4b7>] sysfs_write_file+0xad/0xd7
 [<ffffffff80273ff7>] vfs_write+0xaf/0x14e
 [<ffffffff80274149>] sys_write+0x45/0x6e
 [<ffffffff8020965e>] system_call+0x7e/0x83


Code: 8b 75 00 48 8d 55 10 31 c9 4c 89 e7 e8 32 fa ff ff c7 45 00
RIP  [<ffffffff80255287>] __drain_pages+0x29/0x5f
 RSP <ffff81003f1b3dd8>
CR2: 0000000000000000
                                                                

I am also seeing this issue on linux-2.6.19-rc1.

The test output that identified the presence of the bug:
  http://crucible.osdl.org/runs/2416/test_output/lhcs_regression.log

Some info about the amd64 system is available here:
  http://crucible.osdl.org/runs/2416/sysinfo/amd01.console
  http://crucible.osdl.org/runs/2416/sysinfo/amd01.messages
  http://crucible.osdl.org/runs/2416/sysinfo/amd01.1/proc/
  http://crucible.osdl.org/runs/2416/sysinfo/amd01.1/INFO/
  http://crucible.osdl.org/runs/2416/sysinfo/amd01.1/etc/

I do not see this issue on the x86, ia64, or x86_64 Xeon systems.
I can provide additional detail about these systems if needed.

Bryce


[Index of Archives]     [Linux Kernel]     [Linux DVB]     [Asterisk Internet PBX]     [DCCP]     [Netdev]     [X.org]     [Util Linux NG]     [Fedora Women]     [ALSA Devel]     [Linux USB]

  Powered by Linux