On Wed, Nov 30 2022 at 23:36, Sean Christopherson wrote: > Fix a double NMI shootdown bug found and debugged by Guilherme, who did all > the hard work. NMI shootdown is a one-time thing; the handler leaves NMIs > blocked and enters halt. At best, a second (or third...) shootdown is an > expensive nop, at worst it can hang the kernel and prevent kexec'ing into > a new kernel, e.g. prior to the hardening of register_nmi_handler(), a > double shootdown resulted in a double list_add(), which is fatal when running > with CONFIG_BUG_ON_DATA_CORRUPTION=y. > > With the "right" kexec/kdump configuration, emergency_vmx_disable_all() can > be reached after kdump_nmi_shootdown_cpus() (currently the only two users > of nmi_shootdown_cpus()). > > To fix, move the disabling of virtualization into crash_nmi_callback(), > remove emergency_vmx_disable_all()'s callback, and do a shootdown for > emergency_vmx_disable_all() if and only if a shootdown hasn't yet occurred. > The only thing emergency_vmx_disable_all() cares about is disabling VMX/SVM > (obviously), and since I can't envision a use case for an NMI shootdown that > doesn't want to disable virtualization, doing that in the core handler means > emergency_vmx_disable_all() only needs to ensure _a_ shootdown occurs, it > doesn't care when that shootdown happened or what callback may have run. Reviewed-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>