On Tue, Nov 26, 2019 at 01:44:14PM -0300, Leonardo Bras wrote: > On Mon, 2019-10-21 at 15:58 -0700, Sean Christopherson wrote: ... > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 67ef3f2e19e8..b8534c6b8cf6 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -772,6 +772,18 @@ void kvm_put_kvm(struct kvm *kvm) > > } > > EXPORT_SYMBOL_GPL(kvm_put_kvm); > > > > +/* > > + * Used to put a reference that was taken on behalf of an object associated > > + * with a user-visible file descriptor, e.g. a vcpu or device, if installation > > + * of the new file descriptor fails and the reference cannot be transferred to > > + * its final owner. In such cases, the caller is still actively using @kvm and > > + * will fail miserably if the refcount unexpectedly hits zero. > > + */ > > +void kvm_put_kvm_no_destroy(struct kvm *kvm) > > +{ > > + WARN_ON(refcount_dec_and_test(&kvm->users_count)); > > +} > > +EXPORT_SYMBOL_GPL(kvm_put_kvm_no_destroy); > > > > static int kvm_vm_release(struct inode *inode, struct file *filp) > > { > > @@ -2679,7 +2691,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm > > *kvm, u32 id) > > kvm_get_kvm(kvm); > > r = create_vcpu_fd(vcpu); > > if (r < 0) { > > - kvm_put_kvm(kvm); > > + kvm_put_kvm_no_destroy(kvm); > > goto unlock_vcpu_destroy; > > } > > > > @@ -3117,7 +3129,7 @@ static int kvm_ioctl_create_device(struct kvm > > *kvm, > > kvm_get_kvm(kvm); > > ret = anon_inode_getfd(ops->name, &kvm_device_fops, dev, O_RDWR > > | O_CLOEXEC); > > if (ret < 0) { > > - kvm_put_kvm(kvm); > > + kvm_put_kvm_no_destroy(kvm); > > mutex_lock(&kvm->lock); > > list_del(&dev->vm_node); > > mutex_unlock(&kvm->lock); > > Hello, > > I see what are you solving here, but would not this behavior cause the > refcount to reach negative values? > > If so, is not there a problem? I mean, in some archs (powerpc included) > refcount_dec_and_test() will decrement and then test if the value is > equal 0. If we ever reach a negative value, this will cause that memory > to never be released. > > An example is that refcount_dec_and_test(), on other archs than x86, > will call atomic_dec_and_test(), which on include/linux/atomic- > fallback.h will do: > > return atomic_dec_return(v) == 0; > > To change this behavior, it would mean change the whole atomic_*_test > behavior, or do a copy function in order to change this '== 0' to > '<= 0'. > > Does it make sense? Do you need any help on this? I don't think so. refcount_dec_and_test() will WARN on an underflow when the kernel is built with CONFIG_REFCOUNT_FULL=y. I see no value in duplicating those sanity checks in KVM. This new helper and WARN is to explicitly catch @users_count unexpectedly hitting zero, which is orthogonal to an underflow (although odds are good that a bug that triggers the WARN in kvm_put_kvm_no_destroy() will also lead to an underflow). Leaking the memory is deliberate as the alternative is a guaranteed use-after-free, i.e. kvm_put_kvm_no_destroy() is intended to be used when users_count is guaranteed to be valid after it is decremented.