----- "Lucas Silacci" <Lucas.Silacci@xxxxxxxxxxxx> wrote: > Hi, > > I've run into an issue where crash will enter an infinite loop while > decoding exception stacks if those stacks get corrupted. > > We've seen this on four different systems where the hardware generated > multiple NMIs and the second and subsequent NMIs caused the NMI > exception stack to be overwritten. When this condition is hit, the > bottom rsp on the NMI exception stack (which would normally point you > back to the kernel thread stack or possibly a different exception stack) > points you back into the middle of the same NMI exception stack. This > causes crash to infinitely loop when it tries to decode that exception > stack. > > Now clearly the root cause of the issue is faulty hardware that > generated multiple NMIs. However a very small change in crash can detect > this issue and stop the infinite loop from happening thereby allowing > you to get to a point in crash where you can actually tell that it was > an NMI that caused the system to dump. > > The patch is attached to this email. For x86_64 it will detect the > condition of any exception stack that points back at itself. > > Please feel free to ask me any questions on this. Wow, that's pretty interesting -- I've certainly never seen that before. Can you show me what the backtrace looks like with your patch applied? Thanks, Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility