On Mon, Jul 08, 2013 at 10:59:53AM -0400, Dave Anderson wrote: > > > ----- Original Message ----- > > On Wed, Jul 03, 2013 at 09:04:50AM -0400, Dave Anderson wrote: > > > > > > > > > ----- Original Message ----- > > > > Hey there, > > > > > > > > I'm trying to analyse the vmcore come from an oops caused by a module. > > > > The > > > > module > > > > comes from here: > > > > > > > > http://www.linuxforu.com/2011/01/understanding-a-kernel-oops > > > > > > > > This web page wants to teach how to analyse kernel oops. It provided a > > > > module named 'oops', which triggers a NULL pointer dereference in its > > > > init function. > > > > > > > > The problem is I cannot figure out how to use crash to analyse vmcore: > > > > > > > > GNU gdb (GDB) 7.0 > > > > Copyright (C) 2009 Free Software Foundation, Inc. > > > > License GPLv3+: GNU GPL version 3 or later > > > > <http://gnu.org/licenses/gpl.html> > > > > This is free software: you are free to change and redistribute it. > > > > There is NO WARRANTY, to the extent permitted by law. Type "show > > > > copying" > > > > and "show warranty" for details. > > > > This GDB was configured as "powerpc64-unknown-linux-gnu"... > > > > > > > > KERNEL: /usr/lib/debug/lib/modules/2.6.18-348.el5/vmlinux > > > > DUMPFILE: /var/crash/127.0.0.1-2013-07-01-04:43/vmcore > > > > CPUS: 20 > > > > DATE: Mon Jul 1 04:38:49 2013 > > > > UPTIME: 00:33:44 > > > > LOAD AVERAGE: 0.22, 0.18, 0.07 > > > > TASKS: 482 > > > > NODENAME: lawlp3.upt.austin.ibm.com > > > > RELEASE: 2.6.18-348.el5 > > > > VERSION: #1 SMP Wed Nov 28 21:23:52 EST 2012 > > > > MACHINE: ppc64 (3550 Mhz) > > > > MEMORY: 3.2 GB > > > > PANIC: "Oops: Kernel access of bad area, sig: 11 [#1]" (check log > > > > for > > > > details) > > > > PID: 5402 > > > > COMMAND: "insmod" > > > > TASK: c0000000cfa35150 [THREAD_INFO: c0000000ce5d0000] > > > > CPU: 15 > > > > STATE: TASK_RUNNING (PANIC) > > > > > > > > crash> log > > > > ... .... > > > > oops: module license 'unspecified' taints kernel. > > > > oops from the module > > > > Unable to handle kernel paging request for data at address 0x00000000 > > > > Faulting instruction address: 0xd000000001460060 > > > > Oops: Kernel access of bad area, sig: 11 [#1] > > > > SMP NR_CPUS=128 NUMA > > > > Modules linked in: oops(PU) nfsd exportfs auth_rpcgss autofs4 hidp nfs > > > > nfs_acl rfcomm l2cap bluetooth lockd sunrpc ip6t_REJECT xt_tcpudp > > > > ip6table_filter ip6_tables x_tables be2iscsi ib_iser rdma_cm ib_addr > > > > ib_cm > > > > ib_sa ib_mad iw_cm iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio > > > > cxgb3i libcxgbi libiscsi_tcp libiscsi2 scsi_transport_iscsi2 > > > > scsi_transport_iscsi dm_multipath scsi_dh snd_powermac snd_seq_dummy > > > > snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss > > > > snd_mixer_oss snd_pcm snd_page_alloc snd_timer snd soundcore i2c_core > > > > parport_pc lp parport sg iw_cxgb3 ib_core cxgb3 ibmveth 8021q dm_raid45 > > > > dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror > > > > dm_log > > > > dm_mod lpfc ibmvfc scsi_transport_fc ibmvscsic sd_mod scsi_mod ext3 jbd > > > > uhci_hcd ohci_hcd ehci_hcd > > > > NIP: D000000001460060 LR: D000000001460050 CTR: 0000000000000004 > > > > REGS: c0000000ce5d39b0 TRAP: 0300 Tainted: P ---- (2.6.18-348.el5) > > > > MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24022482 XER: 00000006 > > > > DAR: 0000000000000000, DSISR: 0000000042000000 > > > > TASK = c0000000cfa35150[5402] 'insmod' THREAD: c0000000ce5d0000 CPU: 15 > > > > GPR00: D000000001460050 C0000000CE5D3C30 D00000000146C930 > > > > 0000000000000000 > > > > GPR04: 8000000000001032 0000000000000000 0000000000000000 > > > > 0000000000000000 > > > > GPR08: 0000000000000000 0000000000000000 C0000000015FBB68 > > > > 0000000000000000 > > > > GPR12: 0000000000000000 C000000000570B80 0000000000000000 > > > > D0000000012B1850 > > > > GPR16: D0000000012B1810 D0000000014601B0 0000000000000000 > > > > 0000000000000000 > > > > GPR20: 0000000000000028 D0000000012B0CE9 C0000000005A12E8 > > > > 0000000000000029 > > > > GPR24: D0000000012A0000 000000000000002A C0000000CD6F5A80 > > > > C0000000CD6F5AB0 > > > > GPR28: C0000000005A18C8 D000000001460680 D00000000146C900 > > > > D000000001460680 > > > > NIP [D000000001460060] .my_oops_init+0x2c/0xd4 [oops] > > > > LR [D000000001460050] .my_oops_init+0x1c/0xd4 [oops] > > > > Call Trace: > > > > [C0000000CE5D3C30] [C000000000098944] .sys_init_module+0x1a88/0x1d18 > > > > (unreliable) > > > > [C0000000CE5D3E30] [C0000000000086A4] syscall_exit+0x0/0x40 > > > > Instruction dump: > > > > 4e800020 7c0802a6 fbc1fff0 ebc28000 f8010010 f821ff81 e87e8008 4800002d > > > > e8410028 39200000 38210080 38600000 <91290000> e8010010 ebc1fff0 7c0803a6 > > > > <0>Sending IPI to other cpus... > > > > crash> whatis my_oops_init > > > > whatis: gdb request failed: whatis my_oops_init > > > > crash> mod -s oops > > > > MODULE NAME SIZE OBJECT FILE > > > > d000000001460680 oops 18752 > > > > /lib/modules/2.6.18-348.el5/kernel/oops.ko > > > > crash> whatis my_oops_init > > > > int my_oops_init(void); > > > > crash> dis -l .my_oops_init > > > > <nothing outputed> > > > > crash> sym -m oops > > > > d000000001460000 MODULE START: oops > > > > d000000001460000 (t) .my_oops_exit > > > > d000000001460000 (t) .cleanup_module > > > > d000000001460034 (t) .my_oops_init > > > > d000000001460034 (t) .init_module > > > > d000000001460130 (r) ____versions > > > > d000000001460130 (r) __versions > > > > d000000001460680 (D) __this_module > > > > d000000001464910 (D) cleanup_module > > > > d000000001464910 (d) my_oops_exit > > > > d000000001464920 (D) init_module > > > > d000000001464920 (d) my_oops_init > > > > d000000001464940 MODULE END: oops > > > > crash> bt > > > > PID: 5402 TASK: c0000000cfa35150 CPU: 15 COMMAND: "insmod" > > > > > > > > R0: d000000001460050 R1: c0000000ce5d3c30 R2: d00000000146c930 > > > > R3: 0000000000000000 R4: 8000000000001032 R5: 0000000000000000 > > > > R6: 0000000000000000 R7: 0000000000000000 R8: 0000000000000000 > > > > R9: 0000000000000000 R10: c0000000015fbb68 R11: 0000000000000000 > > > > R12: 0000000000000000 R13: c000000000570b80 R14: 0000000000000000 > > > > R15: d0000000012b1850 R16: d0000000012b1810 R17: d0000000014601b0 > > > > R18: 0000000000000000 R19: 0000000000000000 R20: 0000000000000028 > > > > R21: d0000000012b0ce9 R22: c0000000005a12e8 R23: 0000000000000029 > > > > R24: d0000000012a0000 R25: 000000000000002a R26: c0000000cd6f5a80 > > > > R27: c0000000cd6f5ab0 R28: c0000000005a18c8 R29: d000000001460680 > > > > R30: d00000000146c900 R31: d000000001460680 > > > > NIP: d000000001460060 MSR: 8000000000009032 OR3: c0000000005a13c0 > > > > CTR: 0000000000000004 LR: d000000001460050 XER: 0000000000000006 > > > > CCR: 0000000024022482 MQ: c0000000cd6f5ab0 DAR: 0000000000000000 > > > > DSISR: 0000000042000000 Syscall Result: 0000000000000000 > > > > NIP [d000000001460060] .init_module > > > > LR [d000000001460050] .init_module > > > > > > > > #0 [c0000000ce5d3c30] .sys_init_module at c000000000098944 > > > > #1 [c0000000ce5d3e30] syscall_exit at c0000000000086a4 > > > > syscall [c00] exception frame: > > > > R0: 0000000000000080 R1: 00000000ff91fb60 R2: 000000000fff8eb0 > > > > R3: 0000000010020028 R4: 000000000001caf8 R5: 0000000010020018 > > > > R6: 000000000000002d R7: fffffffffeff0000 R8: 000000000002ffe0 > > > > R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 > > > > R12: 0000000000000000 R13: 000000001001959c R14: 0000000000000000 > > > > R15: 0000000000000000 R16: 0000000000000000 R17: 0000000000000000 > > > > R18: 0000000000000000 R19: 0000000000000000 R20: 0000000000000000 > > > > R21: 0000000000000000 R22: 0000000000000000 R23: 0000000000000000 > > > > R24: 000000000ffbf280 R25: 00000000ff91fdf0 R26: 0000000010020018 > > > > R27: 00000000ff91ff05 R28: 0000000000020000 R29: 000000000001caf8 > > > > R30: 0000000010020028 R31: 0000000000000003 > > > > NIP: 000000000ff0496c MSR: 000000000000d032 OR3: 0000000010020028 > > > > CTR: 000000000ff04964 LR: 0000000010000bf8 XER: 0000000000000000 > > > > CCR: 0000000044000484 MQ: 0000000002756c28 DAR: 000000001004002c > > > > DSISR: 0000000042000000 Syscall Result: 0000000000000000 > > > > > > > > crash> > > > > > > > > as you can see, the 'bt' command says the problem is at '.init_module', > > > > but in fact it should come from '.my_oops_init'. But 'dis -l > > > > .my_oops_init' shows nothing. I cannot use crash to figure out which line > > > > of source code caused the oops. But using gdb as being stated in the web > > > > page > > > > I > > > > can find the code line easily. > > > > > > > > Please help. Thanks. > > > > > > I'm not well-versed in ppc64, but the issue seems to be related > > > to the fact that .my_oops_init and .init_module are both being > > > assigned the same virtual address: > > > > > > d000000001460034 (t) .my_oops_init > > > d000000001460034 (t) .init_module > > > > > > If you do an "nm -Bn" on the oops.ko file, do they show the same > > > offset value? > > > > Thanks, Dave. Looks like they have the same offset, both are zero: > > > > $ nm -Bn oops.ko > > U .printk > > 0000000000000000 T .cleanup_module > > 0000000000000000 T .init_module > > 0000000000000000 t .my_oops_exit > > 0000000000000000 t .my_oops_init > > 0000000000000000 r ____versions > > 0000000000000000 r __mod_srcversion29 > > 0000000000000000 D __this_module > > 0000000000000000 D cleanup_module > > 0000000000000000 d my_oops_exit > > 0000000000000010 D init_module > > 0000000000000010 d my_oops_init > > 0000000000000028 r __module_depends > > 0000000000000038 r __mod_vermagic5 > > > > But why gdb isn't affected by the same offset? > > There is some confusion with the ppc64 usage of the symbol name with > and without the "." preceding the name, i.e. the actual (t) text symbol > of .my_oops_init versus the (D) data symbol of my_oops_init. > > $ gdb /root/oops.ko > GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6) > Copyright (C) 2010 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "ppc64-redhat-linux-gnu". > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>... > Reading symbols from /root/oops.ko...done. > (gdb) disassemble .my_oops_init > A syntax error in expression, near `.my_oops_init'. > (gdb) disassemble my_oops_init > Dump of assembler code for function my_oops_init: > 0x0000000000000034 <+0>: mflr r0 > 0x0000000000000038 <+4>: std r30,-16(r1) > 0x000000000000003c <+8>: ld r30,0(r2) > 0x0000000000000040 <+12>: std r0,16(r1) > 0x0000000000000044 <+16>: stdu r1,-128(r1) > 0x0000000000000048 <+20>: ld r3,-32760(r30) > 0x000000000000004c <+24>: bl 0x4c <my_oops_init+24> > 0x0000000000000050 <+28>: nop > 0x0000000000000054 <+32>: li r9,0 > 0x0000000000000058 <+36>: addi r1,r1,128 > 0x000000000000005c <+40>: li r3,0 > 0x0000000000000060 <+44>: stw r9,0(r9) > 0x0000000000000064 <+48>: ld r0,16(r1) > 0x0000000000000068 <+52>: ld r30,-16(r1) > 0x000000000000006c <+56>: mtlr r0 > 0x0000000000000070 <+60>: blr > End of assembler dump. > (gdb) > > Anyway, the crash utility "dis .my_oops_init" convenience command stops > immediately because it sees that it has already reached the "next" symbol > value of .init_module. You could add an instruction count to force it > to continue: > > crash> dis .my_oops_init > crash> dis .my_oops_init 20 > 0xd0000000046f0034 <.init_module>: mflr r0 > 0xd0000000046f0038 <.init_module+4>: std r30,-16(r1) > 0xd0000000046f003c <.init_module+8>: ld r30,-32768(r2) > 0xd0000000046f0040 <.init_module+12>: std r0,16(r1) > 0xd0000000046f0044 <.init_module+16>: stdu r1,-128(r1) > 0xd0000000046f0048 <.init_module+20>: ld r3,-32760(r30) > 0xd0000000046f004c <.init_module+24>: bl 0xd0000000046f0078 > 0xd0000000046f0050 <.init_module+28>: ld r2,40(r1) > 0xd0000000046f0054 <.init_module+32>: li r9,0 > 0xd0000000046f0058 <.init_module+36>: addi r1,r1,128 > 0xd0000000046f005c <.init_module+40>: li r3,0 > 0xd0000000046f0060 <.init_module+44>: stw r9,0(r9) > 0xd0000000046f0064 <.init_module+48>: ld r0,16(r1) > 0xd0000000046f0068 <.init_module+52>: ld r30,-16(r1) > 0xd0000000046f006c <.init_module+56>: mtlr r0 > 0xd0000000046f0070 <.init_module+60>: blr > 0xd0000000046f0074 <.init_module+64>: .long 0x0 > 0xd0000000046f0078 <.init_module+68>: addis r12,r2,-1 > 0xd0000000046f007c <.init_module+72>: addi r12,r12,32544 > 0xd0000000046f0080 <.init_module+76>: std r2,40(r1) > crash> > > Or just force it stop at the instruction that cause the crash: > > crash> dis -r d0000000046f0060 > 0xd0000000046f0034 <.init_module>: mflr r0 > 0xd0000000046f0038 <.init_module+4>: std r30,-16(r1) > 0xd0000000046f003c <.init_module+8>: ld r30,-32768(r2) > 0xd0000000046f0040 <.init_module+12>: std r0,16(r1) > 0xd0000000046f0044 <.init_module+16>: stdu r1,-128(r1) > 0xd0000000046f0048 <.init_module+20>: ld r3,-32760(r30) > 0xd0000000046f004c <.init_module+24>: bl 0xd0000000046f0078 > 0xd0000000046f0050 <.init_module+28>: ld r2,40(r1) > 0xd0000000046f0054 <.init_module+32>: li r9,0 > 0xd0000000046f0058 <.init_module+36>: addi r1,r1,128 > 0xd0000000046f005c <.init_module+40>: li r3,0 > 0xd0000000046f0060 <.init_module+44>: stw r9,0(r9) > crash> > > Dave > Thanks, Dave. But how could we let the 'dis -l' working here, please? crash> dis -l .my_oops_init 20 0xd000000001460034 <.init_module>: mflr r0 0xd000000001460038 <.init_module+4>: std r30,-16(r1) 0xd00000000146003c <.init_module+8>: ld r30,-32768(r2) 0xd000000001460040 <.init_module+12>: std r0,16(r1) 0xd000000001460044 <.init_module+16>: stdu r1,-128(r1) 0xd000000001460048 <.init_module+20>: ld r3,-32760(r30) 0xd00000000146004c <.init_module+24>: bl 0xd000000001460078 0xd000000001460050 <.init_module+28>: ld r2,40(r1) 0xd000000001460054 <.init_module+32>: li r9,0 0xd000000001460058 <.init_module+36>: addi r1,r1,128 0xd00000000146005c <.init_module+40>: li r3,0 0xd000000001460060 <.init_module+44>: stw r9,0(r9) 0xd000000001460064 <.init_module+48>: ld r0,16(r1) 0xd000000001460068 <.init_module+52>: ld r30,-16(r1) 0xd00000000146006c <.init_module+56>: mtlr r0 0xd000000001460070 <.init_module+60>: blr 0xd000000001460074 <.init_module+64>: .long 0x0 0xd000000001460078 <.init_module+68>: addis r12,r2,-1 0xd00000000146007c <.init_module+72>: addi r12,r12,14152 0xd000000001460080 <.init_module+76>: std r2,40(r1) crash> -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility