* David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote: > > Just curious: did you write this code to debug the series, or was > > there some original hair-tearing regression that motivated you? Is > > there's an upstream fix to marvel at and be horrified about in > > equal measure? > > https://lore.kernel.org/all/2ab14f6f-2690-056b-cf9e-38a12dafd728@xxxxxxx/t/#u > is the upstream fix. Which ended up being the following upstream commit: 88a921aa3c6b ("x86/sev: Ensure that RMP table fixups are reserved") Might make sense to add this commit reference to one of the central patches of the GDT/IDT code, to document how this feature is able to pin down very hard to debug regressions. (Even if the upstream fix was done independently in probably luckier circumstances.) > [...] It's all the more horrifying because it was already *fixed* > upstream before I lost weeks of my life to chasing it. And the > trigger which actually made it *happen*, and made our production > systems allocate memory within that dangerous 1MiB region adjacent to > the RMP table, was a tweak to the NMI watchdog period... leading to > an assumption that we were getting stray perf NMIs during the kexec, > and a *long* wild goose chase based on that false assumption... :-/ > Once I'd written the debug code, I just wanted to clean it up a bit > and push it out for the benefit of others; that *was* the main point > of this series. All the rest of the cleanups are just yak shaving. > > The realisation that we never even explicitly mapped the control code > page and always just got lucky because it happened to be in the same > 2MiB or 1GiB superpage as something else that we did map... was just > a bonus :) I'm amazed and horrified in equal measure ;-) > (That one is fixed in v3 which I'll post shortly, and is already in > https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/kexec-debug > ) > > > I'd argue that this debugging code probably needs a default-off Kconfig > > option, even with the obvious hard-coded environmental limitations & > > assumptions it has. Could be useful to very early debugging & would > > preserve your effort without it bitrotting too obviously. > > Yeah. In v3 I've made it a config option, and made it use the > early_printk serial console (as long as that's an I/O based 8250; we > can add others too later). That's lovely! Thanks, Ingo