On Wed, 10 Feb 2021 17:05:48 -0500 Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote: > On 2/10/21 10:32 AM, Halil Pasic wrote: > > On Wed, 10 Feb 2021 16:24:29 +0100 > > Halil Pasic <pasic@xxxxxxxxxxxxx> wrote: > > > >>> Maybe you could > >>> - grab a reference to kvm while holding the lock > >>> - call the mask handling functions with that kvm reference > >>> - lock again, drop the reference, and do the rest of the processing? > >> I agree, matrix_mdev->kvm can go NULL any time and we are risking > >> a null pointer dereference here. > >> > >> Another idea would be to do > >> > >> > >> static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev) > >> { > >> struct kvm *kvm; > >> > >> mutex_lock(&matrix_dev->lock); > >> if (matrix_mdev->kvm) { > >> kvm = matrix_mdev->kvm; > >> matrix_mdev->kvm = NULL; > >> mutex_unlock(&matrix_dev->lock); > >> kvm_arch_crypto_clear_masks(kvm); > >> mutex_lock(&matrix_dev->lock); > >> matrix_mdev->kvm->arch.crypto.pqap_hook = NULL; > > s/matrix_mdev->kvm/kvm > >> vfio_ap_mdev_reset_queues(matrix_mdev->mdev); > >> kvm_put_kvm(kvm); > >> } > >> mutex_unlock(&matrix_dev->lock); > >> } > >> > >> That way only one unset would actually do the unset and cleanup > >> and every other invocation would bail out with only checking > >> matrix_mdev->kvm. > > But the problem with that is that we enable the the assign/unassign > > prematurely, which could interfere wit reset_queues(). Forget about > > it. > > Not sure what you mean by this. > > I mean because above I first do (1) matrix_mdev->kvm = NULL; and then do (2) vfio_ap_mdev_reset_queues(matrix_mdev->mdev); another thread could do static ssize_t unassign_adapter_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { int ret; unsigned long apid; struct mdev_device *mdev = mdev_from_dev(dev); struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); /* If the guest is running, disallow un-assignment of adapter */ if (matrix_mdev->kvm) return -EBUSY; ... } between (1) and (2), and we would not bail out with -EBUSY because !!kvm because of (1). That means we would change matrix_mdev->matrix and we would not reset the queues that correspond to the apid that was just removed, because by the time we do the reset_queues, the queues are not in the matrix_mdev->matrix any more. Does that make sense?