On 10/08/2017 15:55, Wanpeng Li wrote: > From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> > > watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [warn_test:3089] > irq event stamp: 20532 > hardirqs last enabled at (20531): [<ffffffff8e9b6908>] restore_regs_and_iret+0x0/0x1d > hardirqs last disabled at (20532): [<ffffffff8e9b7ae8>] apic_timer_interrupt+0x98/0xb0 > softirqs last enabled at (8266): [<ffffffff8e9badc6>] __do_softirq+0x206/0x4c1 > softirqs last disabled at (8253): [<ffffffff8e083918>] irq_exit+0xf8/0x100 > CPU: 5 PID: 3089 Comm: warn_test Tainted: G OE 4.13.0-rc3+ #8 > RIP: 0010:kvm_mmu_prepare_zap_page+0x72/0x4b0 [kvm] > Call Trace: > make_mmu_pages_available.isra.120+0x71/0xc0 [kvm] > kvm_mmu_load+0x1cf/0x410 [kvm] > kvm_arch_vcpu_ioctl_run+0x1316/0x1bf0 [kvm] > kvm_vcpu_ioctl+0x340/0x700 [kvm] > ? kvm_vcpu_ioctl+0x340/0x700 [kvm] > ? __fget+0xfc/0x210 > do_vfs_ioctl+0xa4/0x6a0 > ? __fget+0x11d/0x210 > SyS_ioctl+0x79/0x90 > entry_SYSCALL_64_fastpath+0x23/0xc2 > ? __this_cpu_preempt_check+0x13/0x20 > > This can be reproduced readily by ept=N and running syzkaller tests since > many syzkaller testcases don't setup any memory regions. However, if ept=Y > rmode identity map will be created, then kvm_mmu_calculate_mmu_pages() will > extend the number of VM's mmu pages to at least KVM_MIN_ALLOC_MMU_PAGES > which just hide the issue. > > I saw the scenario kvm->arch.n_max_mmu_pages == 0 && kvm->arch.n_used_mmu_pages == 1, > so there is one active mmu page on the list, kvm_mmu_prepare_zap_page() fails > to zap any pages, however prepare_zap_oldest_mmu_page() always returns true. > It incurs infinite loop in make_mmu_pages_available() which causes mmu->lock > softlockup. > > This patch fixes it by setting the return value of prepare_zap_oldest_mmu_page() > according to whether or not there is mmu page zapped. In addition, we bail out > immediately if there is no available mmu page to alloc root page. Nice! But I think all callers of make_mmu_pages_available should be handled the same way. I'm committing the first hunk for now. In the meanwhile, can you look into returning -ENOSPC from make_mmu_pages_available if !kvm_mmu_available_pages after zapping the pages? Thanks, Paolo > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx> > Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> > --- > arch/x86/kvm/mmu.c | 16 +++++++++++++--- > 1 file changed, 13 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 9b1dd11..b9897e8 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -2608,9 +2608,7 @@ static bool prepare_zap_oldest_mmu_page(struct kvm *kvm, > > sp = list_last_entry(&kvm->arch.active_mmu_pages, > struct kvm_mmu_page, link); > - kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); > - > - return true; > + return kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); > } > > /* > @@ -3379,6 +3377,10 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) > if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { > spin_lock(&vcpu->kvm->mmu_lock); > make_mmu_pages_available(vcpu); > + if (!kvm_mmu_available_pages(vcpu->kvm)) { > + spin_unlock(&vcpu->kvm->mmu_lock); > + return 1; > + } > sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, 1, ACC_ALL); > ++sp->root_count; > spin_unlock(&vcpu->kvm->mmu_lock); > @@ -3390,6 +3392,10 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) > MMU_WARN_ON(VALID_PAGE(root)); > spin_lock(&vcpu->kvm->mmu_lock); > make_mmu_pages_available(vcpu); > + if (!kvm_mmu_available_pages(vcpu->kvm)) { > + spin_unlock(&vcpu->kvm->mmu_lock); > + return 1; > + } > sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT), > i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL); > root = __pa(sp->spt); > @@ -3427,6 +3433,10 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) > > spin_lock(&vcpu->kvm->mmu_lock); > make_mmu_pages_available(vcpu); > + if (!kvm_mmu_available_pages(vcpu->kvm)) { > + spin_unlock(&vcpu->kvm->mmu_lock); > + return 1; > + } > sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL, > 0, ACC_ALL); > root = __pa(sp->spt); >