The simple solution in sketch is just this:
The code below results in a lockdep WARN_ON for the reasons documented in my comments interspersed in the code. After trying several different permutations using RCU and an rw_semaphore, I came to the conclusion that setting and clearing the hook pointer while the mdev fd is being open and closed or when the mdev is being removed unnecessarily complicates things. There is no good reason to set/clear the function pointer at this time, nor is there any compelling reason to store the function pointer in a satellite structure of the kvm struct. Since the hook function's lifespan coincides with the lifespan of the vfio_ap module, why not store it when the module is loaded and clear it when the module is unloaded? Access to the function pointer can be controlled by a lock that is independent of the matrix_dev->lock, thus avoiding potential lock dependencies. Access to the mdev is controlled by the matrix_dev lock, so if the mdev is retrieved from the matrix_dev->mdev_list in the hook function, then we are assured that the mdev will never be accessed after it is freed; problem solved.
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c index 9928f785c6773a..f70386452367dd 100644 --- a/arch/s390/kvm/priv.c +++ b/arch/s390/kvm/priv.c @@ -657,13 +657,12 @@ static int handle_pqap(struct kvm_vcpu *vcpu) * Verify that the hook callback is registered, lock the owner * and call the hook. */ - if (vcpu->kvm->arch.crypto.pqap_hook) { - if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner)) - return -EOPNOTSUPP; + if (down_read_trylock(&vcpu->kvm->arch.crypto.rwsem) && + vcpu->kvm->arch.crypto.pqap_hook) { ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
So, the hook function (handle_pqap in vfio_ap_ops.c) is executed while holding the rwsem lock. The hook function tries to lock the matrix_dev->lock mutex.
- module_put(vcpu->kvm->arch.crypto.pqap_hook->owner); if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000) kvm_s390_set_psw_cc(vcpu, 3); + up_read(&vcpu->kv->arch.crypto.rwsem); return ret; } /* diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c index b2c7e10dfdcdcf..64c89f6a711e94 100644 --- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -352,8 +352,7 @@ static int vfio_ap_mdev_create(struct mdev_device *mdev) vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix); init_waitqueue_head(&matrix_mdev->wait_for_kvm); mdev_set_drvdata(mdev, matrix_mdev); - matrix_mdev->pqap_hook.hook = handle_pqap; - matrix_mdev->pqap_hook.owner = THIS_MODULE; + down_write(&&vcpu->kvm->arch.crypto.rwsem); mutex_lock(&matrix_dev->lock); list_add(&matrix_mdev->node, &matrix_dev->mdev_list); mutex_unlock(&matrix_dev->lock); @@ -1132,7 +1131,9 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev, matrix_mdev->matrix.aqm, matrix_mdev->matrix.adm); mutex_lock(&matrix_dev->lock);
Locks the matrix_dev->lock mutex, then tries to lock rwsem to store the pqap_hook under the rwsem lock. During testing, this occurred while the interception of the PQAP instruction was taking place, so the read lock was already taken and the hook function was waiting on the matrix_dev->lock taken above. All of the test cases ran successfully to completion, so there didn't appear to be a deadlock of any sort, but lockdep apparently detected a problem. I was able to get around this by doing the down_write and setting the hook vfio_ap_mdev_group_notifier function before calling this function (above) and before taking the matrix_dev->lock, but that circumvents the protection against accessing a matrix_dev that's already been freed.
+ down_write(&kvm->arch.crypto.rwsem); kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook; + up_write(&kvm->arch.crypto.rwsem); matrix_mdev->kvm = kvm; matrix_mdev->kvm_busy = false; wake_up_all(&matrix_mdev->wait_for_kvm); @@ -1202,7 +1203,9 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev) kvm_arch_crypto_clear_masks(matrix_mdev->kvm); mutex_lock(&matrix_dev->lock); vfio_ap_mdev_reset_queues(matrix_mdev->mdev); + down_write(&matrix_mdev->kvm->arch.crypto.rwsem);
Same scenario here.
matrix_mdev->kvm->arch.crypto.pqap_hook = NULL; + up_write(&matrix_mdev->kvm->arch.crypto.rwsem); kvm_put_kvm(matrix_mdev->kvm); matrix_mdev->kvm = NULL; matrix_mdev->kvm_busy = false; Jason