On Sat, 24 May 2014 20:24:30 +0800 oliver yang <yangoliver@xxxxxxxxx> wrote: > 2014-04-29 19:27 GMT+08:00 Petr Tesarik <ptesarik@xxxxxxx>: > > > > > It will show an incorrect register dump, but the backtrace continues. > > For example: > > > > Hi Petr, > > The back trace looks good. > > How did you know the register dump is incorrect? The saved registers did not make any sense in the interrupted code. ;-) And one of them, which should have been a pointer, looked like RFLAGS. > At least the value of RSP saved in NMI stack seemed to be good, > > RSP: ffff880232b2ff18 Yes, SS, RSP, RFLAGS, CS, and RIP may look good, because they are pushed onto stack by the CPU. But they may point back to a NMI if it was a nested NMI. See my comments below. > Recently, I'm working on a core file analysis, and found crash tool > couldn't give the correct NMI back trace. > But I can find right stack trace by using IST pointer. > > I'm wondering whether your patch could work for my cases. > > May I can try your fix after it is ready. See https://www.redhat.com/archives/crash-utility/2014-April/msg00038.html It's now also in crash git, see commit 8e15958e1b7183bbfbdf004f0ad8f2b62f023f9f. So, how do you recognize wrong register dump? Some symptoms: > > PID: 0 TASK: ffff880232b2c440 CPU: 7 COMMAND: "kworker/0:1" > > #0 [ffff88023fdc7e40] crash_nmi_callback at ffffffff8102428f > > #1 [ffff88023fdc7e50] notifier_call_chain at ffffffff81461ec7 > > #2 [ffff88023fdc7e80] __atomic_notifier_call_chain at ffffffff81461f0d > > #3 [ffff88023fdc7e90] notify_die at ffffffff81461f5d > > #4 [ffff88023fdc7ec0] default_do_nmi at ffffffff8145f3a7 > > #5 [ffff88023fdc7ee0] do_nmi at ffffffff8145f5d8 > > #6 [ffff88023fdc7ef0] restart_nmi at ffffffff8145eb2d > > [exception RIP: mwait_idle+423] > > RIP: ffffffff8100b217 RSP: ffff880232b2ff18 RFLAGS: 00000246 > > RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000246 RAX is the kernel code segment (copied CS) RBX is the kernel code segment (saved CS) RCX looks like RFLAGS (note the typical 246 at the end). > > RDX: ffff880232b2ff18 RSI: 0000000000000018 RDI: 0000000000000001 RDX points to a kernel stack RSI is the kernel data segment (copied SS) RDI is always 1 (the NMI executing flag) > > RBP: ffffffff8100b217 R8: ffffffff8100b217 R9: 0000000000000018 RBP points to kernel text R8 points to kernel text R9 is the kernel data segment (saved SS) > > R10: ffff880232b2ff18 R11: 0000000000000246 R12: ffffffffffffffff R10 points to a kernel stack R11 looks like RFLAGS HTH, Petr Tesarik > > R13: ffffffff81d36108 R14: ffff880232b2ffd8 R15: 0000000000000000 > > ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018 > > --- <NMI exception stack> --- > > #7 [ffff880232b2ff18] mwait_idle at ffffffff8100b217 > > #8 [ffff880232b2ff30] cpu_idle at ffffffff81002126 > > > > If there is a nested NMI, reading the code suggests crash may loop again > > to the NMI stack, but I don't have a sample dump file ATM. > > > > Petr T > > > > -- > > Crash-utility mailing list > > Crash-utility@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/crash-utility > > > > > -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility