----- hutao@xxxxxxxxxxxxxx wrote: > On Thu, Sep 02, 2010 at 08:55:38AM -0400, Dave Anderson wrote: > > > > ----- hutao@xxxxxxxxxxxxxx wrote: > > > > > On Thu, Sep 02, 2010 at 03:46:00PM +0800, hutao@xxxxxxxxxxxxxx > wrote: > > > > Hi, > > > > > > > > I got a problem where it seemed crash got a bad backtrace. > > > > The problem occurred under the following conditions: > > > > On a qemu guest system loading a module that stuck at > > > > the init function(say, call a function that did deadlooping), > > > > then dumped the guest by `virsh dump vm dumpfile', and run > > > > crash on the dumpfile. > > > > > > > > The module is: > > > > > > > > --- > > > > #include <linux/module.h> > > > > > > > > int endless_loop(void) > > > > { > > > > printk("endless loop\n"); > > > > while (1); > > > > > > > > return 0; > > > > } > > > > > > > > int __init endless_init(void) > > > > { > > > > endless_loop(); > > > > > > > > return 0; > > > > } > > > > module_init(endless_init); > > > > > > > > MODULE_LICENSE("GPL"); > > > > --- > > > > > > > > crash bt command got: > > > > > > > > crash> bt -a > > > > PID: 0 TASK: ffffffff81648020 CPU: 0 COMMAND: "swapper" > > > > #0 [ffffffff81601e08] schedule at ffffffff813e8a49 > > > > #1 [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e > > > > #2 [ffffffff81601ea0] need_resched at ffffffff8100970c > > > > #3 [ffffffff81601eb0] default_idle at ffffffff81009f6b > > > > #4 [ffffffff81601ec0] cpu_idle at ffffffff81001bf5 > > > > > > > > PID: 1088 TASK: ffff88001dda2d60 CPU: 1 COMMAND: "insmod" > > > > #0 [ffff88001e751dc8] schedule at ffffffff813e8a49 > > > > #1 [ffff88001e751dd0] schedule at ffffffff813e8aec > > > > #2 [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90 > > > > #3 [ffff88001e751e90] retint_kernel at ffffffff813eab86 > > > > #4 [ffff88001e751f20] do_one_initcall at ffffffff81000210 > > > > #5 [ffff88001e751f50] sys_init_module at ffffffff8106b7ca > > > > #6 [ffff88001e751f80] system_call_fastpath at ffffffff81002a82 > > > > RIP: 00007f761bb58b7a RSP: 00007fff67a43120 RFLAGS: 00010206 > > > > RAX: 00000000000000af RBX: ffffffff81002a82 RCX: 0000000000020010 > > > > RDX: 0000000000b96010 RSI: 00000000000163da RDI: 0000000000b96030 > > > > RBP: 0000000000b96010 R8: 0000000000010011 R9: 0000000000080000 > > > > R10: 00007f761bb4b140 R11: 0000000000000202 R12: 00000000000163da > > > > R13: 00007fff67a44985 R14: 00000000000163da R15: 0000000000b96010 > > > > ORIG_RAX: 00000000000000af CS: 0033 SS: 002b > > > > > > > > Does it lose some function calls between do_one_initcall and retint_kernel? > > > > (endless_loop <- endless_init) > > > > > > > > > > In addition, if we don't stick in the init function (there is still a deadloop > > > somewhere in module but triggered by, say, reading a /proc file) then the backtrace > > > outputed by crash is correct. > > > > When you say "correct", I presume that you see your module functions as frames. > > But if you also see the backtrace starting with "schedule", then it's just luck > > that the backtrace bumped into your module functions. It just so happened that > > when walking back from schedule(), it "mistakenly" stumbled upon your module's > > functions. > > > > In the example above, I presume that when trying to backtrace from retint_kernel(), > > it stepped over your module's "loop" functions that were called via do_one_initcall(). > > That's why I suggest that you should probably see them on the kernel stack in > > between ffff88001e751e90 and ffff88001e751f20 if you use "bt -t". That is what > > "bt -t" is for -- the "bt" command is never guaranteed to be correct. > > Nothing new in between ffff88001e751e90 and ffff88001e751f20 with `bt -at': > > crash> bt -at > PID: 0 TASK: ffffffff81648020 CPU: 0 COMMAND: "swapper" > START: schedule at ffffffff813e8a49 > [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e > [ffffffff81601e60] __atomic_notifier_call_chain at ffffffff813ed799 > [ffffffff81601ea0] need_resched at ffffffff8100970c > [ffffffff81601eb0] default_idle at ffffffff81009f6b > [ffffffff81601ec0] cpu_idle at ffffffff81001bf5 > [ffffffff81601f10] rest_init at ffffffff813d72ec > [ffffffff81601f30] start_kernel at ffffffff816e1d77 > [ffffffff81601f40] command_line at ffffffff81718e90 > [ffffffff81601f70] x86_64_start_reservations at ffffffff816e12ac > [ffffffff81601f90] x86_64_start_kernel at ffffffff816e13a8 > > PID: 1088 TASK: ffff88001dda2d60 CPU: 1 COMMAND: "insmod" > START: schedule at ffffffff813e8a49 > [ffff88001e751dd0] schedule at ffffffff813e8aec > [ffff88001e751e40] get_parent_ip at ffffffff8102f193 > [ffff88001e751e60] sub_preempt_count at ffffffff813ed62c > [ffff88001e751e70] need_resched at ffffffff8102a975 > [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90 > [ffff88001e751e90] retint_kernel at ffffffff813eab86 > [ffff88001e751f20] do_one_initcall at ffffffff81000210 > [ffff88001e751f50] sys_init_module at ffffffff8106b7ca > [ffff88001e751f80] system_call_fastpath at ffffffff81002a82 > RIP: 00007f761bb58b7a RSP: 00007fff67a43120 RFLAGS: 00010206 > RAX: 00000000000000af RBX: ffffffff81002a82 RCX: 0000000000020010 > RDX: 0000000000b96010 RSI: 00000000000163da RDI: 0000000000b96030 > RBP: 0000000000b96010 R8: 0000000000010011 R9: 0000000000080000 > R10: 00007f761bb4b140 R11: 0000000000000202 R12: 00000000000163da > R13: 00007fff67a44985 R14: 00000000000163da R15: 0000000000b96010 > ORIG_RAX: 00000000000000af CS: 0033 SS: 002b > crash> > > > This is the related part of output of `bt -ar' in case you're > interested: > > ffff88001e751e60: sub_preempt_count+146 ffff88001e751e78 > ffff88001e751e70: need_resched+30 ffff88001e751e88 > ffff88001e751e80: preempt_schedule_irq+82 ffff88001e751f18 > ffff88001e751e90: retint_kernel+38 ffff88001e751dd8 > ffff88001e751ea0: 0000000000000000 0000000000000004 > ffff88001e751eb0: 0000000000000000 ffff88001e751fd8 > ffff88001e751ec0: 0000000000000000 0000000000000001 > ffff88001e751ed0: 0000000000000000 ffffffffa00f9000 > ffff88001e751ee0: ffffffffffffff10 ffffffffa00f9004 > ffff88001e751ef0: 0000000000000010 0000000000000246 > ffff88001e751f00: ffff88001e751f18 0000000000000018 > ffff88001e751f10: ffff8800ffffffff ffff88001e751f48 > ffff88001e751f20: do_one_initcall+122 __this_module > ffff88001e751f30: 0000000000000000 0000000000020000 > ffff88001e751f40: 0000000000b96030 ffff88001e751f78 > ffff88001e751f50: sys_init_module+196 0000000000b96010 > ffff88001e751f60: 00000000000163da 00007fff67a44985 > ffff88001e751f70: 00000000000163da 0000000000b96010 > ffff88001e751f80: system_call_fastpath+22 0000000000000202 > ffff88001e751f90: 00007f761bb4b140 0000000000080000 > ffff88001e751fa0: 0000000000010011 00000000000000af > ffff88001e751fb0: 0000000000020010 0000000000b96010 > ffff88001e751fc0: 00000000000163da 0000000000b96030 > ffff88001e751fd0: 00000000000000af 00007f761bb58b7a > ffff88001e751fe0: 0000000000000033 0000000000010206 > ffff88001e751ff0: 00007fff67a43120 000000000000002b > > and `sym -m endless': > > crash> sym -m endless > ffffffffa00f6000 MODULE START: endless > ffffffffa00f6000 (?) endless_loop > ffffffffa00f6030 (r) __ksymtab_endless_loop > ffffffffa00f6040 (r) __kstrtab_endless_loop > ffffffffa00f6050 (d) __this_module > ffffffffa00f6345 MODULE END: endless > crash> Well in that case, there is no evidence left on the kernel stack by the endless_loop() function, i.e., the return address back to endless_loop() from from its call to printk(). Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility