Re: [PATCH 1/2] KVM: x86: fix usage of kvm_lock in set_nx_huge_pages()

Sean Christopherson <seanjc@xxxxxxxxxx> · Fri, 24 Jan 2025 12:11:24 -0800

On Fri, Jan 24, 2025, Paolo Bonzini wrote:
> Protect the whole function with kvm_lock() so that all accesses to
> nx_hugepage_mitigation_hard_disabled are under the lock; but drop it
> when calling out to the MMU to avoid complex circular locking
> situations such as the following:

...

> To break the deadlock, release kvm_lock while taking kvm->slots_lock, which
> breaks the chain:

Heh, except it's all kinds of broken.  IMO, biting the bullet and converting to
an SRCU-protected list is going to be far less work in the long run.

> @@ -7143,16 +7141,19 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
>  	if (new_val != old_val) {
>  		struct kvm *kvm;
>  
> -		mutex_lock(&kvm_lock);
> -
>  		list_for_each_entry(kvm, &vm_list, vm_list) {

This is unsafe, as vm_list can be modified while kvm_lock is dropped.  And
using list_for_each_entry_safe() doesn't help, because the _next_ entry have been
freed.

> +			kvm_get_kvm(kvm);

This needs to be:

		if (!kvm_get_kvm_safe(kvm))
			continue;

because the last reference to the VM could already have been put.

> +			mutex_unlock(&kvm_lock);
> +
>  			mutex_lock(&kvm->slots_lock);
>  			kvm_mmu_zap_all_fast(kvm);
>  			mutex_unlock(&kvm->slots_lock);
>  
>  			vhost_task_wake(kvm->arch.nx_huge_page_recovery_thread);

See my bug report on this being a NULL pointer deref.

> +
> +			mutex_lock(&kvm_lock);
> +			kvm_put_kvm(kvm);

The order is backwards, kvm_put_kvm() needs to be called before acquiring kvm_lock.
If the last reference is put, kvm_put_kvm() => kvm_destroy_vm() will deadlock on
kvm_lock.

>  		}
> -		mutex_unlock(&kvm_lock);
>  	}