On Thu, Apr 08, 2021, Paolo Bonzini wrote: > On 08/04/21 13:15, Wanpeng Li wrote: > > I saw this splatting: > > > > BUG: sleeping function called from invalid context at > > arch/x86/kvm/kvm_cache_regs.h:115 > > kvm_pdptr_read+0x20/0x60 [kvm] > > kvm_mmu_load+0x3bd/0x540 [kvm] > > > > There is a might_sleep() in kvm_pdptr_read(), however, the original > > commit didn't explain more. I can send a formal one if the below fix > > is acceptable. We don't want to drop mmu_lock, even temporarily. The reason for holding it across the entire sequence is to ensure kvm_mmu_available_pages() isn't violated. > I think we can just push make_mmu_pages_available down into > kvm_mmu_load's callees. This way it's not necessary to hold the lock > until after the PDPTR check: ... > @@ -4852,14 +4868,10 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu) > r = mmu_alloc_special_roots(vcpu); > if (r) > goto out; > - write_lock(&vcpu->kvm->mmu_lock); > - if (make_mmu_pages_available(vcpu)) > - r = -ENOSPC; > - else if (vcpu->arch.mmu->direct_map) > + if (vcpu->arch.mmu->direct_map) > r = mmu_alloc_direct_roots(vcpu); > else > r = mmu_alloc_shadow_roots(vcpu); > - write_unlock(&vcpu->kvm->mmu_lock); > if (r) > goto out; Freaking PDPTRs. I was really hoping we could keep the lock and pages_available() logic outside of the helpers. What if kvm_mmu_load() reads the PDPTRs and passes them into mmu_alloc_shadow_roots()? Or is that too ugly? diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index efb41f31e80a..e3c4938cd665 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3275,11 +3275,11 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) return 0; } -static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) +static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu, u64 pdptrs[4]) { struct kvm_mmu *mmu = vcpu->arch.mmu; - u64 pdptrs[4], pm_mask; gfn_t root_gfn, root_pgd; + u64 pm_mask; hpa_t root; int i; @@ -3291,11 +3291,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) if (mmu->root_level == PT32E_ROOT_LEVEL) { for (i = 0; i < 4; ++i) { - pdptrs[i] = mmu->get_pdptr(vcpu, i); - if (!(pdptrs[i] & PT_PRESENT_MASK)) - continue; - - if (mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT)) + if ((pdptrs[i] & PT_PRESENT_MASK) && + mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT)) return 1; } } @@ -4844,21 +4841,33 @@ EXPORT_SYMBOL_GPL(kvm_mmu_reset_context); int kvm_mmu_load(struct kvm_vcpu *vcpu) { - int r; + struct kvm_mmu *mmu = vcpu->arch.mmu; + u64 pdptrs[4]; + int r, i; - r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->direct_map); + r = mmu_topup_memory_caches(vcpu, !mmu->direct_map); if (r) goto out; r = mmu_alloc_special_roots(vcpu); if (r) goto out; + + /* + * On SVM, reading PDPTRs might access guest memory, which might fault + * and thus might sleep. Grab the PDPTRs before acquiring mmu_lock. + */ + if (!mmu->direct_map && mmu->root_level == PT32E_ROOT_LEVEL) { + for (i = 0; i < 4; ++i) + pdptrs[i] = mmu->get_pdptr(vcpu, i); + } + write_lock(&vcpu->kvm->mmu_lock); if (make_mmu_pages_available(vcpu)) r = -ENOSPC; else if (vcpu->arch.mmu->direct_map) r = mmu_alloc_direct_roots(vcpu); else - r = mmu_alloc_shadow_roots(vcpu); + r = mmu_alloc_shadow_roots(vcpu, pdptrs); write_unlock(&vcpu->kvm->mmu_lock); if (r) goto out;