----- "Lucas Silacci" <Lucas.Silacci@xxxxxxxxxxxx> wrote: > Sorry, guess I wasn't clear. Nobody hit the dump switch on these > systems. They simply had multiple hardware errors that apparently > triggered the NMI more than once. That's what I was trying to show with > the SEL records, that the multiple NMIs were straight from hardware with > no human intervention. > > The systems went through a panic (due to multiple NMIs), That's what I'm trying to figure out -- when and how was it decided that the machine should panic instead of continuing to handle the stream of NMIs? In other words, this "dumpsw_notify" function -- why was it called? > > PID: 0 TASK: ffffffff8038c340 CPU: 0 COMMAND: "swapper" > > #0 [ffffffff8046dc50] machine_kexec at ffffffff8011a95b > > #1 [ffffffff8046dd20] crash_kexec at ffffffff80154351 > > #2 [ffffffff8046dde0] panic at ffffffff801327fa > > #3 [ffffffff8046ded0] dumpsw_notify at ffffffff8831c0c3 > > #4 [ffffffff8046dee0] notifier_call_chain at ffffffff8032481f > > #5 [ffffffff8046df00] default_do_nmi at ffffffff80322fab > > #6 [ffffffff8046df40] do_nmi at ffffffff80323365 > > #7 [ffffffff8046df50] nmi at ffffffff8032268f > > [exception RIP: smp_send_stop+84] > > RIP: ffffffff80116e44 RSP: ffffffff8046ddd8 RFLAGS: 00000246 > > RAX: 00000000000000ff RBX: ffffffff8831c1f8 RCX: 000041049c7256e8 > > RDX: 0000000000000005 RSI: 000000005238a938 RDI: 00000000002896a0 > > RBP: ffffffff8046df08 R8: 00000000000040fb R9: 000000005238a7e8 > > R10: 0000000000000002 R11: 0000ffff0000ffff R12: 000000000000000c > > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > > --- <NMI exception stack> --- > > #8 [ffffffff8046ddd8] smp_send_stop at ffffffff80116e44 >From what you're implying, there is no physical "dump switch". So I'm trying figure out where that "dumpsw_notify()" function comes from? Whose module is that and what is its purpose? Dave > a reboot, and > then crash was run on the resulting dump. In fact crash was > automatically run via a startup script and there was no human > intervention until after it was noticed that crash was filling up the > root file system with a temporary file due to the inifinite loop. -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility