On Mon, Apr 11, 2022, Mingwei Zhang wrote: > On Sat, Apr 09, 2022, Sean Christopherson wrote: > > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > > index 671cfeccf04e..89df062d5921 100644 > > --- a/arch/x86/kvm/mmu.h > > +++ b/arch/x86/kvm/mmu.h > > @@ -191,6 +191,15 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, > > .user = err & PFERR_USER_MASK, > > .prefetch = prefetch, > > .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault), > > + > > + /* > > + * Note, enforcing the NX huge page mitigation for nonpaging > > + * MMUs (shadow paging, CR0.PG=0 in the guest) is completely > > + * unnecessary. The guest doesn't have any page tables to > > + * abuse and is guaranteed to switch to a different MMU when > > + * CR0.PG is toggled on (may not always be guaranteed when KVM > > + * is using TDP). See make_spte() for details. > > + */ > > .nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(), > > hmm. I think there could be a minor issue here (even in original code). > The nx_huge_page_workaround_enabled is attached here with page fault. > However, at the time of make_spte(), we call is_nx_huge_page_enabled() > again. Since this function will directly check the module parameter, > there might be a race condition here. eg., at the time of page fault, > the workround was 'true', while by the time we reach make_spte(), the > parameter was set to 'false'. Toggling the mitigation invalidates and zaps all roots. Any page fault acquires mmu_lock after the toggling is guaranteed to see the correct value, any page fault that completed before kvm_mmu_zap_all_fast() is guaranteed to be zapped. > I have not figured out what the side effect is. But I feel like the > make_spte() should just follow the information in kvm_page_fault instead > of directly querying the global config. I started down this exact path :-) The problem is that, even without Ben's series, KVM uses make_spte() for things other than page faults.