Re: [PATCH 2/3] s390/vfio-ap: introduce two new r/w locks to replace wait_queue_head_t

David Hildenbrand <david@xxxxxxxxxx> · Fri, 11 Jun 2021 19:11:50 +0200

On 11.06.21 19:05, Jason Gunthorpe wrote:
On Wed, Jun 09, 2021 at 06:46:33PM -0400, Tony Krowiak wrote:
This patch introduces two new r/w locks to replace the wait_queue_head_t
that was introduced to fix a lockdep splat reported when testing
pass-through of AP queues to a Secure Execution guest. This was the
abbreviated dependency chain reported by lockdep that was fixed using
a wait queue:

kvm_arch_crypto_set_masks+0x4a/0x2b8 [kvm]        kvm->lock
vfio_ap_mdev_group_notifier+0x154/0x170 [vfio_ap] matrix_dev->lock

handle_pqap+0x56/0x1d0 [vfio_ap]    matrix_dev->lock
kvm_vcpu_ioctl+0x2cc/0x898 [kvm]    vcpu->mutex

kvm_s390_cpus_to_pv+0x4e/0xf8 [kvm]   vcpu->mutex
kvm_arch_vm_ioctl+0x3ec/0x550 [kvm]   kvm->lock

Is the problem larger than kvm_arch_crypto_set_masks()? If not it
looks easy enough to fix, just pull the kvm->lock out of
kvm_arch_crypto_set_masks() and obtain it in vfio_ap_mdev_set_kvm()
before the rwsem. Now your locks are in the right order and all should
be well?

+static int vfio_ap_mdev_matrix_store_lock(struct ap_matrix_mdev *matrix_mdev)
+{
+	if (!down_write_trylock(&matrix_mdev->rwsem))
+		return -EBUSY;
+
+	if (matrix_mdev->kvm) {
+		up_write(&matrix_mdev->rwsem);
+		return -EBUSY;
+	}
+
+	if (!down_write_trylock(&matrix_mdev->matrix.rwsem)) {
+		up_write(&matrix_mdev->rwsem);
+		return -EBUSY;
+	}
+
+	return 0;
+}

This double locking is quite strange, at least it deserves a detailed
comment? The comments suggest these locks protect distinct data so..

+
+	ret = vfio_ap_mdev_matrix_store_lock(matrix_mdev);
+	if (ret)
+		return ret;

  	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);

here it obtained both locks but only touched matrix.aqm which is only
protected by the inner lock - what was the point of obtaining the
outer lock?

Also, not convinced down_write_trylock() is appropriate from a sysfs
callback, it should block and wait, surely? Otherwise userspace gets
random racy failures depending on what the kernel is doing??

It might we worth exploring lock_device_hotplug_sysfs() which does a

"return restart_syscall()" with some delay.

--
Thanks,

David / dhildenb