On Thu, Sep 26, 2019 at 04:18:08PM -0700, Ben Gardon wrote: > The tlbs_dirty mechanism for deferring flushes can be expanded beyond > its current use case. This allows MMU operations which do not > themselves require TLB flushes to notify other threads that there are > unflushed modifications to the paging structure. In order to use this > mechanism concurrently, the updates to the global tlbs_dirty must be > made atomically. If there is a hard requirement that tlbs_dirty must be updated atomically then it needs to be an actual atomic so that the requirement is enforced. > Signed-off-by: Ben Gardon <bgardon@xxxxxxxxxx> > --- > arch/x86/kvm/paging_tmpl.h | 29 +++++++++++++---------------- > 1 file changed, 13 insertions(+), 16 deletions(-) > > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h > index 97903c8dcad16..cc3630c8bd3ea 100644 > --- a/arch/x86/kvm/paging_tmpl.h > +++ b/arch/x86/kvm/paging_tmpl.h > @@ -986,6 +986,8 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) > bool host_writable; > gpa_t first_pte_gpa; > int set_spte_ret = 0; > + int ret; > + int tlbs_dirty = 0; > > /* direct kvm_mmu_page can not be unsync. */ > BUG_ON(sp->role.direct); > @@ -1004,17 +1006,13 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) > pte_gpa = first_pte_gpa + i * sizeof(pt_element_t); > > if (kvm_vcpu_read_guest_atomic(vcpu, pte_gpa, &gpte, > - sizeof(pt_element_t))) > - return 0; > + sizeof(pt_element_t))) { > + ret = 0; > + goto out; > + } > > if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) { > - /* > - * Update spte before increasing tlbs_dirty to make > - * sure no tlb flush is lost after spte is zapped; see > - * the comments in kvm_flush_remote_tlbs(). > - */ > - smp_wmb(); > - vcpu->kvm->tlbs_dirty++; > + tlbs_dirty++; > continue; > } > > @@ -1029,12 +1027,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) > > if (gfn != sp->gfns[i]) { > drop_spte(vcpu->kvm, &sp->spt[i]); > - /* > - * The same as above where we are doing > - * prefetch_invalid_gpte(). > - */ > - smp_wmb(); > - vcpu->kvm->tlbs_dirty++; > + tlbs_dirty++; > continue; > } > > @@ -1051,7 +1044,11 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) > if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH) > kvm_flush_remote_tlbs(vcpu->kvm); > > - return nr_present; > + ret = nr_present; > + > +out: > + xadd(&vcpu->kvm->tlbs_dirty, tlbs_dirty); Collecting and applying vcpu->kvm->tlbs_dirty updates at the end versus updating on the fly is a functional change beyond updating tlbs_dirty atomically. At a glance, I have no idea whether or not it affects anything and if so, whether it's correct, i.e. there needs to be an explanation of why it's safe to combine things into a single update. > + return ret; > } > > #undef pt_element_t > -- > 2.23.0.444.g18eeb5a265-goog >