On Mon, 3 May 2021 16:14:43 -0400 Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote: > On 4/26/21 1:48 PM, Cornelia Huck wrote: > > On Fri, 23 Apr 2021 20:03:03 -0300 > > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > >> This is straightforward conversion, the ap_matrix_mdev is actually serving > >> as the vfio_device and we can replace all the mdev_get_drvdata()'s with a > >> simple container_of(). > >> > >> Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > >> --- > >> drivers/s390/crypto/vfio_ap_ops.c | 137 ++++++++++++++++---------- > >> drivers/s390/crypto/vfio_ap_private.h | 2 + > >> 2 files changed, 89 insertions(+), 50 deletions(-) > >> > > (...) > > > >> -static int vfio_ap_mdev_remove(struct mdev_device *mdev) > >> +static void vfio_ap_mdev_remove(struct mdev_device *mdev) > >> { > >> - struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); > >> + struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(&mdev->dev); > >> > >> - if (matrix_mdev->kvm) > >> - return -EBUSY; > >> + /* FIXME: Remove isn't allowed to fail */ > >> + if (WARN_ON(matrix_mdev->kvm)) > >> + return; > > This is a pre-existing problem, but the rework now makes it more > > obvious. > > I agree, I was not aware that returning a non-zero return code > from this callback did not return the -EBUSY to userspace > when the mdev is removed. > > > > > Previously, the mdev code would only print a warning and then continue > > with device removal, even if a ->remove() callback returned an error. > > Now, it's quite clear that we'll end up in a weird half-dead state. > > With the latest kernel from our tree, the remove hangs until the > guest is shutdown and the mdev fd is closed. During the hang, the > dmesg log has the following: > > "No mdev vendor driver request callback support, blocked until released > by user" > > So, it looks like nothing is done with the mdev until the fd for the > mdev is closed when the guest is shut down, at which time the > mdev is removed. You probably want to wire up the request callback and notify userspace. What happens today if the device in QEMU is removed via device_del? Does that already clean up things properly?