I noticed that fast_cr3_switch() always fails when we switch back from L2 to L1 as it is not able to find a cached root. This is odd: host's CR3 usually stays the same, we expect to always follow the fast path. Turns out the problem is that page role is always mismatched because kvm_mmu_get_page() filters out cr4_pae when direct, the value is stored in page header and later compared with new_role in cached_root_available(). As cr4_pae is always set in long mode prev_roots cache is dysfunctional. The problem appeared after we introduced kvm_calc_mmu_role_common(): previously, we were only setting this bit for shadow MMU root but now we set it for everything. Restore the original behavior. Fixes: 7dcd57552008 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed") Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> --- RFC: Alternatively, I can suggest two solutions: - Do not clear cr4_pae in kvm_mmu_get_page() and check direct on call sites (detect_write_misaligned(), get_written_sptes()). - Filter cr4_pae out when direct in kvm_mmu_new_cr3(). The code in kvm_mmu_get_page() is very ancient, I'm afraid to touch it :=) --- arch/x86/kvm/mmu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f2d1d230d5b8..c729e98eee49 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -4791,7 +4791,6 @@ static union kvm_mmu_role kvm_calc_mmu_role_common(struct kvm_vcpu *vcpu, role.base.access = ACC_ALL; role.base.nxe = !!is_nx(vcpu); - role.base.cr4_pae = !!is_pae(vcpu); role.base.cr0_wp = is_write_protection(vcpu); role.base.smm = is_smm(vcpu); role.base.guest_mode = is_guest_mode(vcpu); @@ -4871,6 +4870,8 @@ kvm_calc_shadow_mmu_root_page_role(struct kvm_vcpu *vcpu, bool base_only) { union kvm_mmu_role role = kvm_calc_mmu_role_common(vcpu, base_only); + role.base.cr4_pae = !!is_pae(vcpu); + role.base.smep_andnot_wp = role.ext.cr4_smep && !is_write_protection(vcpu); role.base.smap_andnot_wp = role.ext.cr4_smap && -- 2.20.1