On Fri, Feb 23, 2024, Paolo Bonzini wrote: > On 2/3/24 01:23, Sean Christopherson wrote: > > Add full memory barriers in kvm_mmu_track_write() and account_shadowed() > > to plug a (very, very theoretical) race where kvm_mmu_track_write() could > > miss a 0->1 transition of indirect_shadow_pages and fail to zap relevant, > > *stale* SPTEs. > > Ok, so we have > > emulator_write_phys > overwrite PTE > kvm_page_track_write > kvm_mmu_track_write > // memory barrier missing here > if (indirect_shadow_pages) > zap(); > > and on the other side > > FNAME(page_fault) > FNAME(fetch) > kvm_mmu_get_child_sp > kvm_mmu_get_shadow_page > __kvm_mmu_get_shadow_page > kvm_mmu_alloc_shadow_page > account_shadowed > indirect shadow pages++ > // memory barrier missing here > if (FNAME(gpte_changed)) // reads PTE > goto out > > If you can weave something like this in the commit message the sequence > would be a bit clearer. Roger that. > > In practice, this bug is likely benign as both the 0=>1 transition and > > reordering of this scope are extremely rare occurrences. > > I wouldn't call it benign, it's more that it's unobservable in practice but > the race is real. However... > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 3c193b096b45..86b85060534d 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -830,6 +830,14 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp) > > struct kvm_memory_slot *slot; > > gfn_t gfn; > > + /* > > + * Ensure indirect_shadow_pages is elevated prior to re-reading guest > > + * child PTEs in FNAME(gpte_changed), i.e. guarantee either in-flight > > + * emulated writes are visible before re-reading guest PTEs, or that > > + * an emulated write will see the elevated count and acquire mmu_lock > > + * to update SPTEs. Pairs with the smp_mb() in kvm_mmu_track_write(). > > + */ > > + smp_mb(); > > ... this memory barrier needs to be after the increment (the desired > ordering is store-before-read). Doh. I'll post a fixed version as a one-off v3. Thanks!