Junaid Shahid <junaids@xxxxxxxxxx> writes: > On 6/30/20 3:07 AM, Vitaly Kuznetsov wrote: >> Undesired triple fault gets injected to L1 guest on SVM when L2 is >> launched with certain CR3 values. It seems the mmu_check_root() >> check in fast_pgd_switch() is wrong: first of all we don't know >> if 'new_pgd' is a GPA or a nested GPA and, in case it is a nested >> GPA, we can't check it with kvm_is_visible_gfn(). >> >> The problematic code path is: >> nested_svm_vmrun() >> ... >> nested_prepare_vmcb_save() >> kvm_set_cr3(..., nested_vmcb->save.cr3) >> kvm_mmu_new_pgd() >> ... >> mmu_check_root() -> TRIPLE FAULT >> >> The mmu_check_root() check in fast_pgd_switch() seems to be >> superfluous even for non-nested case: when GPA is outside of the >> visible range cached_root_available() will fail for non-direct >> roots (as we can't have a matching one on the list) and we don't >> seem to care for direct ones. >> >> Also, raising #TF immediately when a non-existent GFN is written to CR3 >> doesn't seem to mach architecture behavior. >> >> Fixes: 7c390d350f8b ("kvm: x86: Add fast CR3 switch code path") >> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> >> --- >> - The patch fixes the immediate issue and doesn't seem to break any >> tests even with shadow PT but I'm not sure I properly understood >> why the check was there in the first place. Please review! >> --- >> arch/x86/kvm/mmu/mmu.c | 3 +-- >> 1 file changed, 1 insertion(+), 2 deletions(-) >> >> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c >> index 76817d13c86e..286c74d2ae8d 100644 >> --- a/arch/x86/kvm/mmu/mmu.c >> +++ b/arch/x86/kvm/mmu/mmu.c >> @@ -4277,8 +4277,7 @@ static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t new_pgd, >> */ >> if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL && >> mmu->root_level >= PT64_ROOT_4LEVEL) >> - return !mmu_check_root(vcpu, new_pgd >> PAGE_SHIFT) && >> - cached_root_available(vcpu, new_pgd, new_role); >> + return cached_root_available(vcpu, new_pgd, new_role); >> >> return false; >> } >> > > The check does seem superfluous, so should be ok to remove. Though I > think that fast_pgd_switch() really should be getting only L1 > GPAs. Otherwise, there could be confusion between the same GPAs from > two different L2s. > > IIUC, at least on Intel, only L1 CR3s (including shadow L1 CR3s for > L2) or L1 EPTPs should get to fast_pgd_switch(). But I am not familiar > enough with SVM to see why an L2 GPA would end up there. From a > cursory look, it seems that until "978ce5837c7e KVM: SVM: always > update CR3 in VMCB", enter_svm_guest_mode() was calling kvm_set_cr3() > only when using shadow paging, in which case I assume that > nested_vmcb->save.cr3 would have been an L1 CR3 shadowing the L2 CR3, > correct? But now kvm_set_cr3() is called even when not using shadow > paging, which I suppose is how we are getting the L2 CR3. Should we > skip calling fast_pgd_switch() in that particular case? Thank you for your thoughts, this is helpful indeed. As far as I can see, nVMX calls kvm_mmu_new_pgd() directly in two cases: 1) nested_ept_init_mmu_context() -> kvm_init_shadow_ept_mmu() when switching to L2 (and 'new_pgd' is EPTP) 2) nested_vmx_load_cr3() when !nested_ept (and 'new_pgd' is 'cr3') I think we need to do something similar for nSVM to make the root cache work and work correctly: 1) supplement nested_svm_init_mmu_context() -> kvm_init_shadow_mmu() with kvm_mmu_new_pgd() 2) stop doing kvm_mmu_new_pgd() from nested_prepare_vmcb_save() -> kvm_set_cr3() when npt_enabled Let me try to experiment here a bit. Thanks again! -- Vitaly