need some help with debugging

Bernd Schubert <bschubert@xxxxxxxxx> · Tue, 31 Jul 2007 13:58:33 +0200

Hi,

from time to time I see oopses which are rather hard to debug, at least gdb 
is not able to figure out which file:line the crashed function belongs to.
Well, finding out the file:function() is not difficult using grep, kscope, 
etc, but this doesn't help me to know where it crashed there.

Here's an example:

mddeaugnxbeo001 login: [ 4754.784895] Unable to handle kernel NULL pointer dere
[ 4754.790687]  [<0000000000000000>] _stext+0x7fdff0e0/0xe0
[ 4754.798858] PGD 1191c5067 PUD 1191c1067 PMD 0
[ 4754.803597] Oops: 0010 [1] SMP
[ 4754.806944] CPU 3
[ 4754.809078] Modules linked in: rdma_ucm rdma_cm ib_cm iw_cm ib_addr ib_uverbh
[ 4754.846970] Pid: 0, comm: swapper Not tainted 2.6.20.14 #1
[ 4754.852744] RIP: 0010:[<0000000000000000>]  [<0000000000000000>] _stext+0x7f0
[ 4754.861195] RSP: 0018:ffff8100016c3e18  EFLAGS: 00010246
[ 4754.866808] RAX: ffff81011d851fd8 RBX: ffffffffffffffff RCX: 0000000000000000
[ 4754.874333] RDX: 0000000000008948 RSI: ffff8100016c3e30 RDI: ffff810037d66000
[ 4754.881857] RBP: ffff810037d66000 R08: ffff8100016c3e40 R09: ffff81011d851fd8
[ 4754.889372] R10: ffff81011d851fd8 R11: ffff810037d457d0 R12: ffff8100016c3e30
[ 4754.896902] R13: ffff8100016c3e40 R14: 0000000000000000 R15: ffff81011d814000
[ 4754.904435] FS:  0000000040039940(0000) GS:ffff81011d8332c0(0000) knlGS:00000
[ 4754.912981] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 4754.919051] CR2: 0000000000000000 CR3: 00000001191bb000 CR4: 00000000000006e0
[ 4754.926586] Process swapper (pid: 0, threadinfo ffff81011d850000, task ffff8)
[ 4754.935101] Stack:  ffffffff880f08f9 ffff810080033e80 0000000000000001 000005
[ 4754.943647]  0000000000000000 796d000000010000 ffff810080032ac0 ffff8100016c0
[ 4754.951484]  ffffffff80223d37 ffff81011d4bfa00 ffff81011b95b500 ffff810037d60
[ 4754.959145] Call Trace:
[ 4754.961940]  <IRQ>  [<ffffffff880f08f9>] :bonding:bond_check_dev_link+0xcd/00
[ 4754.969761]  [<ffffffff80223d37>] load_balance+0x7f/0x2a1
[ 4754.975455]  [<ffffffff880f1fc4>] :bonding:bond_mii_monitor+0x9f/0x425
[ 4754.982365]  [<ffffffff880f1f25>] :bonding:bond_mii_monitor+0x0/0x425
[ 4754.989174]  [<ffffffff80232f94>] run_timer_softirq+0x149/0x1b2
[ 4754.995406]  [<ffffffff88002583>] :forcedeth:nv_nic_irq+0x160/0x229
[ 4755.002012]  [<ffffffff8022f2aa>] __do_softirq+0x50/0xbb
[ 4755.007612]  [<ffffffff8020a7cc>] call_softirq+0x1c/0x28
[ 4755.013200]  [<ffffffff8020c2f2>] do_softirq+0x2e/0x94
[ 4755.018606]  [<ffffffff80214630>] smp_apic_timer_interrupt+0x54/0x66
[ 4755.025307]  [<ffffffff80207945>] default_idle+0x0/0x41
[ 4755.030833]  [<ffffffff8020a276>] apic_timer_interrupt+0x66/0x70
[ 4755.037188]  <EOI>  [<ffffffff80207972>] default_idle+0x2d/0x41
[ 4755.043452]  [<ffffffff80207aed>] cpu_idle+0x51/0x74
[ 4755.048713]
[ 4755.050273]
[ 4755.050274] Code:  Bad RIP value.
[ 4755.055384] RIP  [<0000000000000000>] _stext+0x7fdff0e0/0xe0
[ 4755.061360]  RSP <ffff8100016c3e18>
[ 4755.065036] CR2: 0000000000000000
[ 4755.068528]  <0>Kernel panic - not syncing: Aiee, killing interrupt handler!

Looking at it it already looks strange:

[ 4754.790687]  [<0000000000000000>] _stext+0x7fdff0e0/0xe0

Hmm, so it crashed at _stext+0x7fdff0e0, but _stext has only 0xe0 bytes?

[ 4754.969761]  [<ffffffff80223d37>] load_balance+0x7f/0x2a1

This is clearly left over on the the stack. So the interesting part calling 
_stext() should be 

[ 4754.961940]  <IRQ>  [<ffffffff880f08f9>] :bonding:bond_check_dev_link+0xcd/00

This looks strange again, the function has an overall size of 00 bytes?

The kernel wasn't compiled with debugging support, so I recompiled it and 
now try to figure out the corresponding line.

(gdb) p bond_check_dev_link
$1 = {int (struct bonding *, struct net_device *, int)} 0x82c <bond_check_dev_link>
(gdb) l *(0x82c + 0xcd)
No source file for address 0x8f9.

Any ideas how to figure out the proper line?

Thanks in advance,
Bernd

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ