Debugging hung Xeon platform... nmi_watchdog=1 not helping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi.

I’m developing network software (traffic shaping with netfilter and tc, etc) on a Xeon platform (E3-1225 v3).

The same image can run on a Xeon D-1518 SoC under moderate load and never hang, but on the E3-1225 I’m seeing lockup.

I tried messing with the iTCO_wdt watchdog timer, but the best I could get to happen was the SMI reseting the system (not sure if it’s winking the power supply or just asserting INIT# on the PCH).

I’d REALLY love to get a kernel panic() so I could see what my cores are doing the next time the system lockups and the watchdog device stops being tickled from user-space.

Out of ideas, however.

I emailed the linux-watchdog mailing list, but it seems that either (a) some iTCO parameters of the PCH are set by BIOS and can’t be changed by Linux (they’re locked until a reset), or (b) the drivers are relying on certain registers being set correctly by BIOS but that’s not the case on my hardware (a Lanner Inc. FW-8771) or (c) there are too many chip-specific types of parameters that need to be set and the driver doesn’t handle my particular silicon.

All of those theories are consistent with the fact that I can’t reproduce this on a SuperMicro 5018D-FN8T (with the Xeon D-1518 SoC) when all external circumstances (i.e. traffic loading) are identical and I run the same exact disk image.

Oh, and to get back to the original subject: when I add nmi_watchdog=1 to my linux command-line, the system powers off when it gets to the software lockup… without that, it seems to do an SMI-driven INIT# or power-cycle (can’t tell which).

Been staring at this issue for 2 weeks and I’m starting to lose my mind.

What’s the collective wisdom about what to do in such a situation?

Thanks,

-Philip

--
To unsubscribe from this list: send the line "unsubscribe linux-x86_64" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux ia64]     [Linux Kernel]     [DCCP]     [Linux ARM]     [Yosemite News]     [Linux SCSI]     [Linux Hams]
  Powered by Linux