On Wed, Jul 28, 2021 at 06:37:38PM +0000, Sean Christopherson wrote: > On Wed, Jul 28, 2021, Yu Zhang wrote: > > Thanks a lot for your reply, Sean. > > > > On Tue, Jul 27, 2021 at 06:07:35PM +0000, Sean Christopherson wrote: > > > On Wed, Jul 28, 2021, Yu Zhang wrote: > > > > Hi all, > > > > > > > > I'd like to ask a question about kvm_reset_context(): is there any > > > > reason that we must alway unload TDP root in kvm_mmu_reset_context()? > > > > > > The short answer is that mmu_role is changing, thus a new root shadow page is > > > needed. > > > > I saw the mmu_role is recalculated, but I have not figured out how this > > change would affect TDP. May I ask a favor to give an example? Thanks! > > > > I realized that if we only recalculate the mmu role, but do not unload > > the TDP root(e.g., when guest efer.nx flips), base role of the SPs will > > be inconsistent with the mmu context. But I do not understand why this > > shall affect TDP. > > The SPTEs themselves are not affected if the base mmu_role doesn't change; note, > this holds true for shadow paging, too. What changes is all of the kvm_mmu > knowledge about how to walk the guest PTEs, e.g. if a guest toggles CR4.SMAP, > then KVM needs to recalculate the #PF permissions for guest accesses so that > emulating instructions at CPL=0 does the right thing. > > As for EFER.NX and CR0.WP, they are in the base page role because they need to > be there for shadow paging, e.g. if the guest toggles EFER.NX, then the reserved > bit and executable permissions change, and reusing shadow paging for the old > EFER.NX could result in missed reserved #PF and/or incorrect executable #PF > behavior. > > For simplicitly, it's far, far eaiser to reuse the same page role struct for > TDP paging (both legacy and TDP MMUs) and shadow paging. > > However, I think we can safely ignore NX, WP, SMEP, and SMAP in direct shadow > pages, which would allow reusing a TDP root across changes. This is only a baby > step (assuming it even works), as further changes to set_cr0/cr4/efer would be > needed to fully realize the optimizations, e.g. to avoid complete teardown if > the root_count hits zero. Thanks for your explaination, Sean. And I fully agree! As you can see in my first mail, I kept reinitiate the mmu role in kvm_reset_context(), so that guest paging mode change will be handled correctly, for guest page table walker. As to shadow, the unload is always needed, because NX and WP of existing SPs matters. +void kvm_mmu_reset_context(struct kvm_vcpu *vcpu, bool force_tdp_unload) { - kvm_mmu_unload(vcpu); + if (!tdp_enabled || force_tdp_unload) + kvm_mmu_unload(vcpu); + kvm_init_mmu(vcpu); } In the caller, force_tdp_unload was set to false for CR0/CR4/EFER changes. For SMM and cpuid updates, it is set to true. With this change, I can successfully boot a VM(and of course, number of unloadings is greatly reduced). But access test case in kvm-unit-test hangs, after CR4.SMEP is flipped. I'm trying to figure out why... > > I'll put this on my todo list, I've been looking for an excuse to update the > cr0/cr4/efer flows anyways :-). If it works, the changes should be relatively > minor, if it works... > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index a8cdfd8d45c4..700664fe163e 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -2077,8 +2077,20 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, > role = vcpu->arch.mmu->mmu_role.base; > role.level = level; > role.direct = direct; > - if (role.direct) > + if (role.direct) { > role.gpte_is_8_bytes = true; > + > + /* > + * Guest PTE permissions do not impact SPTE permissions for > + * direct MMUs. Either there are no guest PTEs (CR0.PG=0) or > + * guest PTE permissions are enforced by the CPU (TDP enabled). > + */ > + WARN_ON_ONCE(access != ACC_ALL); > + role.efer_nx = 0; > + role.cr0_wp = 0; > + role.smep_andnot_wp = 0; > + role.smap_andnot_wp = 0; > + } How about we do this in kvm_calc_mmu_role_common()? :-) Thanks Yu > role.access = access; > if (!direct_mmu && vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL) { > quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level));