On 27/04/2022 19:48, Guilherme G. Piccoli wrote: > In the panic path we have a list of functions to be called, the panic > notifiers - such callbacks perform various actions in the machine's > last breath, and sometimes users want them to run before kdump. We > have the parameter "crash_kexec_post_notifiers" for that. When such > parameter is used, the function "crash_smp_send_stop()" is executed > to poweroff all secondary CPUs through the NMI-shootdown mechanism; > part of this process involves disabling virtualization features in > all CPUs (except the main one). > > Now, in the emergency restart procedure we have also a way of > disabling VMX in all CPUs, using the same NMI-shootdown mechanism; > what happens though is that in case we already NMI-disabled all CPUs, > the emergency restart fails due to a second addition of the same items > in the NMI list, as per the following log output: > > sysrq: Trigger a crash > Kernel panic - not syncing: sysrq triggered crash > [...] > Rebooting in 2 seconds.. > list_add double add: new=<addr1>, prev=<addr2>, next=<addr1>. > ------------[ cut here ]------------ > kernel BUG at lib/list_debug.c:29! > invalid opcode: 0000 [#1] PREEMPT SMP PTI > > In order to reproduce the problem, users just need to set the kernel > parameter "crash_kexec_post_notifiers" *without* kdump set in any > system with the VMX feature present. > > Since there is no benefit in re-disabling VMX in all CPUs in case > it was already done, this patch prevents that by guarding the restart > routine against doubly issuing NMIs unnecessarily. Notice we still > need to disable VMX locally in the emergency restart. > > Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported) > Fixes: 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump friendly version in panic path") > Cc: David P. Reed <dpreed@xxxxxxxxxxxx> > Cc: Hidehiro Kawai <hidehiro.kawai.ez@xxxxxxxxxxx> > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > Cc: Sean Christopherson <seanjc@xxxxxxxxxx> > Signed-off-by: Guilherme G. Piccoli <gpiccoli@xxxxxxxxxx> > --- > arch/x86/include/asm/cpu.h | 1 + > arch/x86/kernel/crash.c | 8 ++++---- > arch/x86/kernel/reboot.c | 14 ++++++++++++-- > 3 files changed, 17 insertions(+), 6 deletions(-) > Hi Paolo / Sean / Vitaly, sorry for the ping. But do you think this fix is OK from the VMX point-of-view? I'd like to send a V2 of this set soon, so any review here is highly appreciated! Cheers, Guilherme