Hi,
Full description of the problem:
Kernel version: 2.6.32.36
Oops information:
[9638271.695663] BUG: unable to handle kernel paging request at 0000000000a3ad90
[9638271.695685] IP: [<ffffffff800cddfa>] kfree+0x5a/0x200
[9638271.695701] PGD f94ff067 PUD fd652067 PMD 0
[9638271.695707] Oops: 0000 [#1] SMP
[9638271.695712] last sysfs file: /sys/devices/xen-backend/vbd-415-51776/statistics/wr_sect
Trap number:14, message:Oops
Error num: 0
Sigal Num:11_SIGSEGV
Event ID:DIE_OOPS
RIP: e030:[<ffffffff800cddfa>]
<ffffffff800cddfa>{kfree+0x5a}
RSP: e02b:ffff88001ce65da8 EFLAGS: 00010006
RAX: 0000000000a3ad90 RBX: 0000000000000000 RCX: 00000000000002eb
RDX: 00000000001761f0 RSI: 00000000000002eb RDI: ffff88002ec3e3e0
RBP: fffffffffffffffe R08: 0000000000000000 R09: ffff88002ec3e3e0
R10: ffffffffffffffff R11: ffffffff801b0e50 R12: 0000000000008001
R13: 0000000000000024 R14: 00000000ffffff9c R15: ffff88001ce65e48
FS: 00007fbe05e71700(0000) GS:ffff880002008000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000a3ad90 CR3: 00000000f9009000 CR4: 0000000000002620
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<kernel_trace>
<ffffffff80009b05>{dump_trace+0x65}
<ffffffff8037d897>{notifier_call_chain+0x37}
<ffffffff8005a1ed>{notify_die+0x2d}
<ffffffff8037bd0b>{__die+0x8b}
<ffffffff8001bed1>{no_context+0xd1}
<ffffffff8001c1f5>{__bad_area_nosemaphore+0x175}
<ffffffff8037b298>{page_fault+0x28}
<ffffffff800cddfa>{kfree+0x5a}
<ffffffff800da03d>{put_filp+0x1d}
<ffffffff800e7133>{do_filp_open+0x723}
<ffffffff800d62b7>{do_sys_open+0x97}
<ffffffff80007378>{system_call_fastpath+0x16}
[<00007fbe059c8040>]
</kernel_trace>
Following is my own preliminary analysis:
crash> dis kfree
0xffffffff800cdda0 <kfree>: push %r15
0xffffffff800cdda2 <kfree+2>: push %r14
0xffffffff800cdda4 <kfree+4>: push %r13
0xffffffff800cdda6 <kfree+6>: push %r12
0xffffffff800cdda8 <kfree+8>: push %rbp
0xffffffff800cdda9 <kfree+9>: push %rbx
0xffffffff800cddaa <kfree+10>: sub $0x18,%rsp
0xffffffff800cddae <kfree+14>: cmp $0x10,%rdi
0xffffffff800cddb2 <kfree+18>: mov %rdi,0x8(%rsp)
0xffffffff800cddb7 <kfree+23>: jbe 0xffffffff800cde7c <kfree+220>
0xffffffff800cddbd <kfree+29>: mov %gs:0x67c1,%al
0xffffffff800cddc5 <kfree+37>: movb $0x1,%gs:0x67c1
0xffffffff800cddce <kfree+46>: mov %al,0x17(%rsp)
0xffffffff800cddd2 <kfree+50>: mov 0x8(%rsp),%rdi
0xffffffff800cddd7 <kfree+55>: mov 0x758872(%rip),%rbx # 0xffffffff80826650
0xffffffff800cddde <kfree+62>: callq 0xffffffff800228e0 <__phys_addr>
0xffffffff800cdde3 <kfree+67>: shr $0xc,%rax
0xffffffff800cdde7 <kfree+71>: lea 0x0(,%rax,8),%rdx
0xffffffff800cddef <kfree+79>: shl $0x6,%rax
0xffffffff800cddf3 <kfree+83>: sub %rdx,%rax
0xffffffff800cddf6 <kfree+86>: lea (%rbx,%rax,1),%rax
0xffffffff800cddfa <kfree+90>: mov (%rax),%rdx
0xffffffff800cddfd <kfree+93>: test $0x20000,%edx
0xffffffff800cde03 <kfree+99>: je 0xffffffff800cde1b <kfree+123>
0xffffffff800cde05 <kfree+101>: mov 0x10(%rax),%rax
0xffffffff800cde09 <kfree+105>: mov (%rax),%rdx
0xffffffff800cde0c <kfree+108>: test $0x20000,%edx
0xffffffff800cde12 <kfree+114>: je 0xffffffff800cde1b <kfree+123>
......
Normally %rbx should be the value of mem_map which is a fixed value in my system, the address of the mem_map is 0xffffffff80826650, and the value of mem_map is 0xffff880004802000.
But here, %rbx was changed to 0x0000000000000000, in my opinion, the possible reason is below:
1. mem_map was changed with an unknown reason, led to %rbx is wrong.
2. mem_map is right, but %rip is wrong, led to %rbx is wrong.
3. mem_map is right, and %rip is also right, but %rbx was changed after later.
I changed the mem_map value to 0x0000000000000000, kernel is panic immediately, but it can’t produce the vmcore, this problem has the vmcore(sad to say, vmcore was gone because of carelessness).
So we can exclude the reason one, the rest of the reason is two and three, but i don’t know how they can happen.
I don't do anything before the system panic, and i can’t reproduce this problem.