On Tue, Jul 19 2022, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Thu, Jul 07, 2022 at 03:37:16PM -0600, Alex Williamson wrote: >> On Mon, 4 Jul 2022 21:59:03 -0300 >> Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: >> > diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c >> > index b49e2e9db2dc6f..09e0ce7b72324c 100644 >> > --- a/drivers/s390/cio/vfio_ccw_ops.c >> > +++ b/drivers/s390/cio/vfio_ccw_ops.c >> > @@ -44,31 +44,19 @@ static int vfio_ccw_mdev_reset(struct vfio_ccw_private *private) >> > return ret; >> > } >> > >> > -static int vfio_ccw_mdev_notifier(struct notifier_block *nb, >> > - unsigned long action, >> > - void *data) >> > +static void vfio_ccw_dma_unmap(struct vfio_device *vdev, u64 iova, u64 length) >> > { >> > struct vfio_ccw_private *private = >> > - container_of(nb, struct vfio_ccw_private, nb); >> > - >> > - /* >> > - * Vendor drivers MUST unpin pages in response to an >> > - * invalidation. >> > - */ >> > - if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) { >> > - struct vfio_iommu_type1_dma_unmap *unmap = data; >> > - >> > - if (!cp_iova_pinned(&private->cp, unmap->iova)) >> > - return NOTIFY_OK; >> > + container_of(vdev, struct vfio_ccw_private, vdev); >> > >> > - if (vfio_ccw_mdev_reset(private)) >> > - return NOTIFY_BAD; >> > + /* Drivers MUST unpin pages in response to an invalidation. */ >> > + if (!cp_iova_pinned(&private->cp, iova)) >> > + return; >> > >> > - cp_free(&private->cp); >> > - return NOTIFY_OK; >> > - } >> > + if (vfio_ccw_mdev_reset(private)) >> > + return; >> > >> > - return NOTIFY_DONE; >> > + cp_free(&private->cp); >> > } >> >> >> The cp_free() call is gone here with [1], so I think this function now >> just ends with: >> >> ... >> vfio_ccw_mdev_reset(private); >> } >> >> There are also minor contextual differences elsewhere from that series, >> so a quick respin to record the changes on list would be appreciated. >> >> However the above kind of highlights that NOTIFY_BAD that silently gets >> dropped here. I realize we weren't testing the return value of the >> notifier call chain and really we didn't intend that notifiers could >> return a failure here, but does this warrant some logging or suggest >> future work to allow a device to go offline here? Thanks. > > It looks like no. > > If the FSM trapped in a bad state here, such as > VFIO_CCW_STATE_NOT_OPER, then it means it should have already unpinned > the pages and this is considered a success for this purpose A rather pathological case would be a subchannel that cannot be quiesced and does not end up being non-operational; in theory, the hardware could still try to access the buffers we provided for I/O. I'd say that is extremely unlikely, we might log it, but really cannot do anything else. > > The return code here exists only to return to userspace so it can > detect during a VFIO_DEVICE_RESET that the device has crashed > irrecoverably. Does it imply only that ("it's dead, Jim"), or can it also imply a runaway device? Not that userspace can do much in any case. > > Thus just continuing to silently ignore it seems like the best thing. > > Jason