[Cc +GVT-g maintainers/lists] On Tue, 22 May 2018 10:13:46 +0200 Cornelia Huck <cohuck@xxxxxxxxxx> wrote: > On Fri, 18 May 2018 13:10:25 -0600 > Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > > > When we create an mdev device, we check for duplicates against the > > parent device and return -EEXIST if found, but the mdev device > > namespace is global since we'll link all devices from the bus. We do > > catch this later in sysfs_do_create_link_sd() to return -EEXIST, but > > with it comes a kernel warning and stack trace for trying to create > > duplicate sysfs links, which makes it an undesirable response. > > > > Therefore we should really be looking for duplicates across all mdev > > parent devices, or as implemented here, against our mdev device list. > > Using mdev_list to prevent duplicates means that we can remove > > mdev_parent.lock, but in order not to serialize mdev device creation > > and removal globally, we add mdev_device.active which allows UUIDs to > > be reserved such that we can drop the mdev_list_lock before the mdev > > device is fully in place. > > > > Two behavioral notes; first, mdev_parent.lock had the side-effect of > > serializing mdev create and remove ops per parent device. This was > > an implementation detail, not an intentional guarantee provided to > > the mdev vendor drivers. Vendor drivers can trivially provide this > > serialization internally if necessary. Second, review comments note > > the new -EAGAIN behavior when the device, and in particular the remove > > attribute, becomes visible in sysfs. If a remove is triggered prior > > to completion of mdev_device_create() the user will see a -EAGAIN > > error. While the errno is different, receiving an error during this > > period is not, the previous implementation returned -ENODEV for the > > same condition. Furthermore, the consistency to the user is improved > > in the case where mdev_device_remove_ops() returns error. Previously > > concurrent calls to mdev_device_remove() could see the device > > disappear with -ENODEV and return in the case of error. Now a user > > would see -EAGAIN while the device is in this transitory state. > > > > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx> > > --- > > Documentation/vfio-mediated-device.txt | 5 ++ > > drivers/vfio/mdev/mdev_core.c | 102 +++++++++++--------------------- > > drivers/vfio/mdev/mdev_private.h | 2 - > > 3 files changed, 42 insertions(+), 67 deletions(-) > > Reviewed-by: Cornelia Huck <cohuck@xxxxxxxxxx> > > I think it is better to deal with any possible vendor driver > implications on top of this (I still believe that vfio-ccw is fine). Thanks Cornelia. So if vfio-ccw is fine, presumably NVIDIA is fine, then this leaves GVT-g to see if there's any fallout. Zhenyu & Zhi, I've linked the series under discussion here below[1]. The question to you is the first of the two behavioral notes listed above, does GVT-g have any dependency on the mdev core providing serialization per mdev parent device for the create and remove callbacks within the mdev_parent_ops? This was never an intended feature of the implementation and as noted it should be trivial for for an mdev vendor driver to provide equivalent course grained serialization if necessary. Of course it would be better to implement that sooner rather than later if required. I see that __intel_gvt_create_vgpu() makes use of gvt->lock, which would seem to already provide this level of per-parent locking. The remove path makes use of this same lock, so I think we're ok, but looking for an explicit ack so there are no surprises. I'd like to queue this series for v4.18. Thanks, Alex [1] https://lkml.org/lkml/2018/5/18/1035 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx