On Thu, 26 Sep 2024 18:22:39 -0700 Eric Mackay <eric.mackay@xxxxxxxxxx> wrote: > > On 9/24/24 5:40 AM, Igor Mammedov wrote: > >> On Fri, 19 Apr 2024 12:17:01 -0400 > >> boris.ostrovsky@xxxxxxxxxx wrote: > >> > >>> On 4/17/24 9:58 AM, boris.ostrovsky@xxxxxxxxxx wrote: > >>>> > >>>> I noticed that I was using a few months old qemu bits and now I am > >>>> having trouble reproducing this on latest bits. Let me see if I can get > >>>> this to fail with latest first and then try to trace why the processor > >>>> is in this unexpected state. > >>> > >>> Looks like 012b170173bc "system/qdev-monitor: move drain_call_rcu call > >>> under if (!dev) in qmp_device_add()" is what makes the test to stop failing. > >>> > >>> I need to understand whether lack of failures is a side effect of timing > >>> changes that simply make hotplug fail less likely or if this is an > >>> actual (but seemingly unintentional) fix. > >> > >> Agreed, we should find out culprit of the problem. > > > > > > I haven't been able to spend much time on this unfortunately, Eric is > > now starting to look at this again. > > > > One of my theories was that ich9_apm_ctrl_changed() is sending SMIs to > > vcpus serially while on HW my understanding is that this is done as a > > broadcast so I thought this could cause a race. I had a quick test with > > pausing and resuming all vcpus around the loop but that didn't help. > > > > > >> > >> PS: > >> also if you are using AMD host, there was a regression in OVMF > >> where where vCPU that OSPM was already online-ing, was yanked > >> from under OSMP feet by OVMF (which depending on timing could > >> manifest as lost SIPI). > >> > >> edk2 commit that should fix it is: > >> https://github.com/tianocore/edk2/commit/1c19ccd5103b > >> > >> Switching to Intel host should rule that out at least. > >> (or use fixed edk2-ovmf-20240524-5.el10.noarch package from centos, > >> if you are forced to use AMD host) > > I haven't been able to reproduce the issue on an Intel host thus far, > but it may not be an apples-to-apples comparison because my AMD hosts > have a much higher core count. > > > > > I just tried with latest bits that include this commit and still was > > able to reproduce the problem. > > > > > >-boris > > The initial hotplug of each CPU appears to complete from the > perspective of OVMF and OSPM. SMBASE relocation succeeds, and the new > CPU reports back from the pen. It seems to be the later INIT-SIPI-SIPI > sequence sent from the guest that doesn't complete. > > My working theory has been that some CPU/AP is lagging behind the others > when the BSP is waiting for all the APs to go into SMM, and the BSP just > gives up and moves on. Presumably the INIT-SIPI-SIPI is sent while that > CPU does finally go into SMM, and other CPUs are in normal mode. > > I've been able to observe the SMI handler for the problematic CPU will > sometimes start running when no BSP is elected. This means we have a > window of time where the CPU will ignore SIPI, and least 1 CPU is in > normal mode (the BSP) which is capable of sending INIT-SIPI-SIPI from > the guest. I've re-read whole thread and noticed Boris were saying: > On Tue, Apr 16, 2024 at 10:57 PM <boris.ostrovsky@xxxxxxxxxx> wrote: > > On 4/16/24 4:53 PM, Paolo Bonzini wrote: ... > > > > > > What is the reproducer for this? > > > > Hotplugging/unplugging cpus in a loop, especially if you oversubscribe > > the guest, will get you there in 10-15 minutes. ... So there was unplug involved as well, which was broken since forever. Recent patch https://patchew.org/QEMU/20230427211013.2994127-1-alxndr@xxxxxx/20230427211013.2994127-2-alxndr@xxxxxx/ has exposed issue (unexpected uplug/unplug flow) with root cause in OVMF. Firmware was letting non involved APs run wild in normal mode. As result AP that was calling _EJ0 and holding ACPI lock was continuing _EJ0 and releasing ACPI lock, while BSP and a being removed CPU were still in SMM world. And any other plug/unplug op were able to grab ACPI lock and trigger another SMI, which breaks hotplug flow expectations (aka exclusive access to hotplug registers during plug/unplug op) Perhaps that's what you are observing. Please check if following helps: https://github.com/kraxel/edk2/commit/738c09f6b5ab87be48d754e62deb72b767415158 So yes, SIPI can be lost (which should be expected as others noted) but that normally shouldn't be an issue as wakeup_secondary_cpu_via_init() do resend SIPI. However if wakeup_secondary_cpu is set to another handler that doesn't resend SIPI, It might be an issue.