On Tue, 7 Sep 2010 10:24:29 -0400 (EDT) Dave Anderson <anderson@xxxxxxxxxx> wrote: > > ----- "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote: > > > On Thu, 2 Sep 2010 08:44:12 -0400 (EDT) > > Dave Anderson <anderson@xxxxxxxxxx> wrote: > > > > > > > > ----- hutao@xxxxxxxxxxxxxx wrote: > > > > > > > Hi, > > > > > > > > I got a problem where it seemed crash got a bad backtrace. > > > > The problem occurred under the following conditions: > > > > On a qemu guest system loading a module that stuck at > > > > the init function(say, call a function that did deadlooping), > > > > then dumped the guest by `virsh dump vm dumpfile', and run > > > > crash on the dumpfile. > > > > > > > > The module is: > > > > > > > > --- > > > > #include <linux/module.h> > > > > > > > > int endless_loop(void) > > > > { > > > > printk("endless loop\n"); > > > > while (1); > > > > > > > > return 0; > > > > } > > > > > > > > int __init endless_init(void) > > > > { > > > > endless_loop(); > > > > > > > > return 0; > > > > } > > > > module_init(endless_init); > > > > > > > > MODULE_LICENSE("GPL"); > > > > --- > > > > > > > > crash bt command got: > > > > > > > > crash> bt -a > > > > PID: 0 TASK: ffffffff81648020 CPU: 0 COMMAND: "swapper" > > > > #0 [ffffffff81601e08] schedule at ffffffff813e8a49 > > > > #1 [ffffffff81601e18] apic_timer_interrupt at ffffffff8100344e > > > > #2 [ffffffff81601ea0] need_resched at ffffffff8100970c > > > > #3 [ffffffff81601eb0] default_idle at ffffffff81009f6b > > > > #4 [ffffffff81601ec0] cpu_idle at ffffffff81001bf5 > > > > > > > > PID: 1088 TASK: ffff88001dda2d60 CPU: 1 COMMAND: "insmod" > > > > #0 [ffff88001e751dc8] schedule at ffffffff813e8a49 > > > > #1 [ffff88001e751dd0] schedule at ffffffff813e8aec > > > > #2 [ffff88001e751e80] preempt_schedule_irq at ffffffff813e8c90 > > > > #3 [ffff88001e751e90] retint_kernel at ffffffff813eab86 > > > > #4 [ffff88001e751f20] do_one_initcall at ffffffff81000210 > > > > #5 [ffff88001e751f50] sys_init_module at ffffffff8106b7ca > > > > #6 [ffff88001e751f80] system_call_fastpath at ffffffff81002a82 > > > > RIP: 00007f761bb58b7a RSP: 00007fff67a43120 RFLAGS: 00010206 > > > > RAX: 00000000000000af RBX: ffffffff81002a82 RCX: 0000000000020010 > > > > RDX: 0000000000b96010 RSI: 00000000000163da RDI: 0000000000b96030 > > > > RBP: 0000000000b96010 R8: 0000000000010011 R9: 0000000000080000 > > > > R10: 00007f761bb4b140 R11: 0000000000000202 R12: 00000000000163da > > > > R13: 00007fff67a44985 R14: 00000000000163da R15: 0000000000b96010 > > > > ORIG_RAX: 00000000000000af CS: 0033 SS: 002b > > > > > > > > Does it lose some function calls between do_one_initcall and retint_kernel? > > > > (endless_loop <- endless_init) > > > > > > Your best bet is to use "bt -t" in a case such as that. > > > > > > If there are no "starting hooks" for the backtrace code to use, then > > > it simply defaults to the RSP value left in the task->thread_struct->rsp, > > > and the RIP of the instruction following "__switch_to". Those will be > > > stale, because they represent the last time that the task blocked in > > > kernel space. In the case of your endless loop inside the kernel, there > > > is nothing for the crash utility to grab onto as the starting points because > > > the task is essentially "active". It's somewhat similar in nature to > > > using "bt -a" on a live system -- the tasks are running either in > > > kernel or user space, but do not have any "starting points" for the > > > backtrace code to latch onto, so it's not even allowed as a command. > > > > > > > Hmm. but, IIUC, vmcore on the real host (not on virtual machine) taken by kdump > > can show endless_loop(). Is it because kdump() reads paniced-host-image ? And > > we should take vmcore generated by "virsh dump" as > > - "it's just a live dump image and there are no guarantee of synchronous register > > inforamtion. If you take care, please freeze kernel by some switch". > > > > Can SIGSTOP or somethig to qemu will help us to take synchronous snapshot of registers ? > > Kdump works because the shutdown sends an NMI to each cpu, leaving an obvious > shutdown trail that can be tracked from the NMI stack back to the process stack. > > You could also try using alt-sysrq-c on the guest prior to taking the virsh dump > from the host. > Thank you. I'm now considering sending NMI via virsh to guest before starting dump rather than suspending. (if not live dump.) And see what happens. Thanks, -Kame -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility