On 22/10/21 16:56, Maxim Levitsky wrote:
vCPU0 vCPU1
===== =====
- disable AVIC
- #NPT on AVIC MMIO access
vCPU1 is now OUTSIDE_GUEST_CODE
-*stuck on something prior to the page fault code*
- enable AVIC
-*still stuck on something prior to the page fault code*
- disable AVIC:
- raise KVM_REQ_APICV_UPDATE request
kvm_make_all_cpus_request does not wait for vCPU1
- zap the SPTE (does nothing, doesn't race
with anything either)
vCPU0 writes mmu_notifier_seq here
- now vCPU1 finally starts running the page fault code.
vCPU1 reads mmu_notifier_seq here
So yeah, I think you're right.
The VM value doesn't have this problem, because it is always stored
before mmu_notifier_seq is incremented (on the write side) and loaded
after the mmu_notifier_seq (on the page fault side). Therefore, if
vCPU1 sees a stale per-VM apicv_active flag, it is always going to see a
stale mmu_notifier_seq.
With the per-vCPU flag, instead, the flag is written by
kvm_vcpu_update_apicv, and that can be long after mmu_notifier_seq is
incremented.
Paolo
- vCPU1 AVIC is still enabled
(because vCPU1 never handled KVM_REQ_APICV_UPDATE),
so the page fault code will populate the SPTE.
- handle KVM_REQ_APICV_UPDATE
- finally disable vCPU1 AVIC
- VMRUN (vCPU1 AVIC disabled, SPTE populated)