On 4/16/24 4:53 PM, Paolo Bonzini wrote:
On 4/16/24 22:47, Boris Ostrovsky wrote:
When a processor is running in SMM and receives INIT message the
interrupt
is left pending until SMM is exited. On the other hand, SIPI, which
typically follows INIT, is discarded. This presents a problem since
sender
has no way of knowing that its SIPI has been dropped, which results in
processor failing to come up.
Keeping the SIPI pending avoids this scenario.
This is incorrect - it's yet another ugly legacy facet of x86, but we
have to live with it. SIPI is discarded because the code is supposed
to retry it if needed ("INIT-SIPI-SIPI").
I couldn't find in the SDM/APM a definitive statement about whether SIPI
is supposed to be dropped.
The sender should set a flag as early as possible in the SIPI code so
that it's clear that it was not received; and an extra SIPI is not a
problem, it will be ignored anyway and will not cause trouble if
there's a race.
What is the reproducer for this?
Hotplugging/unplugging cpus in a loop, especially if you oversubscribe
the guest, will get you there in 10-15 minutes.
Typically (although I think not always) this is happening when OVMF if
trying to rendezvous and a processor is missing and is sent an extra SMI.
-boris