On Fri, Dec 29, 2017 at 9:32 AM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > > From the various oopses, it looks like this happens when getting a > double fault while trying to go idle. The CPU gets is probably trying > to return from the double fault, but it didn't do anything useful in the > fault handler so it just continues faulting, but the NMI watchdog can > still get an oops out of it. Hmm. Which oops are you looking at? The ones I see in the bugzilla don't seem to have anything interesting in them. [ Oh. I think I see the one you think of in the gentoo bug report ] There does seem to be a lot of odd double faults that don't make progress. And that in turn indicates that it may be about ESPFIX64 - all other double fault cases should cause a fault printout, but ESPFIX64 has a magical silent "turn double fault into a fake #GP fault". Maybe that one triggers over and over again? > Couple more things: > > MCORE2 seems to get one oddball compiler flag (-march=core2): > >> cflags-$(CONFIG_MCORE2) += \ >> $(call cc-option,-march=core2,$(call cc-option,-mtune=generic)) > > It would be interesting to see if replacing the above "$(call" with: > > $(call cc-option,-mtune=generic) > > makes the problem go away the same way as changing the .config option. Definitely. > The MCORE2 config option also sets CONFIG_X86_P6_NOP, which overrides > the normal X86_64 noops, if I'm reading that code correctly. Only for the ASM_NOPx nops, as far as I can tell. The actual alternative NOP rewriting seems to pick the nops based on machine, not on config options. And I don't see anybody who actually uses the ASM_NOPx defines except for arch/x86/kernel/kprobes/opt.c, which uses ASM_NOP5. Am I missing something? We actually have a lot of lines in arch/x86/include/asm/nops.h that set the ASM_NOPx values to the proper things, but then they are never used. We have that special "ASM_NOP5_ATOMIC" define that we are so careful about, but again, it's actually never used as far as I can tell. Maybe there's some magic token concatenation use that I'm missing in my trivial grep, but it does seem to be dead code. But double-checking that "-march=core2" case is definitely worth looking into. Especially since there are clear indications that it's gcc version-dependent anyway. Alexander? Linus