Re: [PATCH] KVM/x86: Do not clear SIPI while in SMM

Igor Mammedov <imammedo@xxxxxxxxxx> · Fri, 27 Sep 2024 11:28:39 +0200

On Thu, 26 Sep 2024 18:22:39 -0700
Eric Mackay <eric.mackay@xxxxxxxxxx> wrote:

> > On 9/24/24 5:40 AM, Igor Mammedov wrote:  
> >> On Fri, 19 Apr 2024 12:17:01 -0400
> >> boris.ostrovsky@xxxxxxxxxx wrote:
> >>   
> >>> On 4/17/24 9:58 AM, boris.ostrovsky@xxxxxxxxxx wrote:  
> >>>>
> >>>> I noticed that I was using a few months old qemu bits and now I am
> >>>> having trouble reproducing this on latest bits. Let me see if I can get
> >>>> this to fail with latest first and then try to trace why the processor
> >>>> is in this unexpected state.  
> >>>
> >>> Looks like 012b170173bc "system/qdev-monitor: move drain_call_rcu call
> >>> under if (!dev) in qmp_device_add()" is what makes the test to stop failing.
> >>>
> >>> I need to understand whether lack of failures is a side effect of timing
> >>> changes that simply make hotplug fail less likely or if this is an
> >>> actual (but seemingly unintentional) fix.  
> >> 
> >> Agreed, we should find out culprit of the problem.  
> >
> >
> > I haven't been able to spend much time on this unfortunately, Eric is 
> > now starting to look at this again.
> >
> > One of my theories was that ich9_apm_ctrl_changed() is sending SMIs to 
> > vcpus serially while on HW my understanding is that this is done as a 
> > broadcast so I thought this could cause a race. I had a quick test with 
> > pausing and resuming all vcpus around the loop but that didn't help.
> >
> >  
> >> 
> >> PS:
> >> also if you are using AMD host, there was a regression in OVMF
> >> where where vCPU that OSPM was already online-ing, was yanked
> >> from under OSMP feet by OVMF (which depending on timing could
> >> manifest as lost SIPI).
> >> 
> >> edk2 commit that should fix it is:
> >>      https://github.com/tianocore/edk2/commit/1c19ccd5103b
> >> 
> >> Switching to Intel host should rule that out at least.
> >> (or use fixed edk2-ovmf-20240524-5.el10.noarch package from centos,
> >> if you are forced to use AMD host)  
> 
> I haven't been able to reproduce the issue on an Intel host thus far,
> but it may not be an apples-to-apples comparison because my AMD hosts
> have a much higher core count.
> 
> >
> > I just tried with latest bits that include this commit and still was 
> > able to reproduce the problem.
> >
> >
> >-boris  
> 
> The initial hotplug of each CPU appears to complete from the
> perspective of OVMF and OSPM. SMBASE relocation succeeds, and the new
> CPU reports back from the pen. It seems to be the later INIT-SIPI-SIPI
> sequence sent from the guest that doesn't complete.
> 
> My working theory has been that some CPU/AP is lagging behind the others
> when the BSP is waiting for all the APs to go into SMM, and the BSP just
> gives up and moves on. Presumably the INIT-SIPI-SIPI is sent while that
> CPU does finally go into SMM, and other CPUs are in normal mode.
> 
> I've been able to observe the SMI handler for the problematic CPU will
> sometimes start running when no BSP is elected. This means we have a
> window of time where the CPU will ignore SIPI, and least 1 CPU is in
> normal mode (the BSP) which is capable of sending INIT-SIPI-SIPI from
> the guest.

I've re-read whole thread and noticed Boris were saying:
  > On Tue, Apr 16, 2024 at 10:57 PM <boris.ostrovsky@xxxxxxxxxx> wrote:
  > > On 4/16/24 4:53 PM, Paolo Bonzini wrote:  
  ...
  > > >
  > > > What is the reproducer for this?  
  > >
  > > Hotplugging/unplugging cpus in a loop, especially if you oversubscribe
  > > the guest, will get you there in 10-15 minutes.
  ...

So there was unplug involved as well, which was broken since forever.

Recent patch
 https://patchew.org/QEMU/20230427211013.2994127-1-alxndr@xxxxxx/20230427211013.2994127-2-alxndr@xxxxxx/
has exposed issue (unexpected uplug/unplug flow) with root cause in OVMF.
Firmware was letting non involved APs run wild in normal mode.
As result AP that was calling _EJ0 and holding ACPI lock was
continuing _EJ0 and releasing ACPI lock, while BSP and a being removed
CPU were still in SMM world. And any other plug/unplug op
were able to grab ACPI lock and trigger another SMI, which breaks
hotplug flow expectations (aka exclusive access to hotplug registers
during plug/unplug op)
Perhaps that's what you are observing.

Please check if following helps:
  https://github.com/kraxel/edk2/commit/738c09f6b5ab87be48d754e62deb72b767415158

So yes, SIPI can be lost (which should be expected as others noted)
but that normally shouldn't be an issue as wakeup_secondary_cpu_via_init()
do resend SIPI.
However if wakeup_secondary_cpu is set to another handler that doesn't
resend SIPI, It might be an issue.