On 2023/7/11 15:26, Oliver Upton wrote:
On Tue, Jul 11, 2023 at 08:23:25AM +0100, Marc Zyngier wrote:
On Mon, 10 Jul 2023 18:55:53 +0100,
Oliver Upton <oliver.upton@xxxxxxxxx> wrote:
Xiang reports that VMs occasionally fail to boot on GICv4.1 systems when
running a preemptible kernel, as it is possible that a vCPU is blocked
without requesting a doorbell interrupt.
The issue is that any preemption that occurs between vgic_v4_put() and
schedule() on the block path will mark the vPE as nonresident and *not*
request a doorbell irq.
It'd be worth spelling out. You need to go via *three* schedule()
calls: one to be preempted (with DB set), one to be made resident
again, and then the final one in kvm_vcpu_halt(), clearing the DB on
vcpu_put() due to the bug.
Yeah, a bit lazy in the wording. What I had meant to imply was
preemption happening after the doorbell is set up and before the thread
has an opportunity to explicitly schedule out. Perhaps I should just say
that.
Fix it by consistently requesting a doorbell irq in the vcpu put path if
the vCPU is blocking.
Yup. Agreed!
While this technically means we could drop the
early doorbell irq request in kvm_vcpu_wfi(), deliberately leave it
intact such that vCPU halt polling can properly detect the wakeup
condition before actually scheduling out a vCPU.
Yeah, just like what we did in commit 07ab0f8d9a12 ("KVM: Call
kvm_arch_vcpu_blocking early into the blocking sequence").
My only concern is that if the preemption happens before halt polling,
we would enter the polling loop with VPE already resident on the RD and
can't recognize any firing GICv4.x virtual interrupts (targeting this
VPE) in polling. [1]
Given that making VPE resident on the vcpu block path (i.e., in
kvm_vcpu_halt()) makes little sense (right?) and leads to this sort of
problem, a crude idea is that we can probably keep track of the
"nested" vgic_v4_{put,load} calls (instead of a single vpe->resident
flag) and keep VPE *not resident* on the whole block path (like what we
had before commit 8e01d9a396e6). And we then rely on
kvm_vcpu_wfi/vgic_v4_load to actually schedule the VPE on...
The "need doorbell" rule would be simple as before: we do request DB
only if there is a WFI trap (by kvm_vcpu_wfi/vgic_v4_load(vcpu, true)),
and don't need it for any other cases.
Nevertheless [1] is just a matter of performance and shouldn't get in
the way of we fixing the initial bug. ;-)
Reviewed-by: Zenghui Yu <yuzenghui@xxxxxxxxxx>
Thanks,
Zenghui