On Thu, Sep 26, 2019 at 10:26:08AM -0600, Alex Williamson wrote: > On Thu, 26 Sep 2019 11:46:55 -0400 > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: > > > On Wed, Sep 25, 2019 at 10:30:28AM -0600, Alex Williamson wrote: > > > On Wed, 25 Sep 2019 10:11:00 -0400 > > > Rob Miller <rob.miller@xxxxxxxxxxxx> wrote: > > > > > > On Tue, 24 Sep 2019 21:53:29 +0800 > > > > > > Jason Wang <jasowang@xxxxxxxxxx> wrote: > > > > > > > diff --git a/drivers/vfio/mdev/vfio_mdev.c > > > > > > b/drivers/vfio/mdev/vfio_mdev.c > > > > > > > index 891cf83a2d9a..95efa054442f 100644 > > > > > > > --- a/drivers/vfio/mdev/vfio_mdev.c > > > > > > > +++ b/drivers/vfio/mdev/vfio_mdev.c > > > > > > > @@ -14,6 +14,7 @@ > > > > > > > #include <linux/slab.h> > > > > > > > #include <linux/vfio.h> > > > > > > > #include <linux/mdev.h> > > > > > > > +#include <linux/vfio_mdev.h> > > > > > > > > > > > > > > #include "mdev_private.h" > > > > > > > > > > > > > > @@ -24,16 +25,16 @@ > > > > > > > static int vfio_mdev_open(void *device_data) > > > > > > > { > > > > > > > struct mdev_device *mdev = device_data; > > > > > > > - struct mdev_parent *parent = mdev->parent; > > > > > > > + const struct vfio_mdev_device_ops *ops = > > > > > > mdev_get_dev_ops(mdev); > > > > > > > int ret; > > > > > > > > > > > > > > - if (unlikely(!parent->ops->open)) > > > > > > > + if (unlikely(!ops->open)) > > > > > > > return -EINVAL; > > > > > > > > > > > > > > if (!try_module_get(THIS_MODULE)) > > > > > > > return -ENODEV; > > > > > > > > > > > > > RJM>] My understanding lately is that this call to > > > > try_module_get(THIS_MODULE) is no longer needed as is considered as a > > > > latent bug. > > > > Quote from > > > > https://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use-try-module-get-module-put > > > > : > > > > There are a number of uses of try_module_get(THIS_MODULE) in the kernel > > > > source but most if not all of them are latent bugs that should be cleaned > > > > up. > > > > > > This use seems to fall exactly into the case where it is necessary, the > > > open here is not a direct VFS call, it's an internal interface between > > > modules. The user is interacting with filesystem objects from the vfio > > > module and the module reference we're trying to acquire here is to the > > > vfio-mdev module. Thanks, > > > > > > Alex > > > > > > I think the latent bug refers not to module get per se, > > but to the module_put tied to it. E.g.: > > > > static void vfio_mdev_release(void *device_data) > > { > > struct mdev_device *mdev = device_data; > > struct mdev_parent *parent = mdev->parent; > > > > if (likely(parent->ops->release)) > > parent->ops->release(mdev); > > > > module_put(THIS_MODULE); > > > > Does anything prevent the module from unloading at this point? > > if not then ... > > > > > > } > > > > it looks like the implicit return (with instructions for argument pop > > and functuon return) here can get overwritten on module > > unload, causing a crash when executed. > > > > IOW there's generally no way for module to keep a reference > > to itself: it can take a reference but it needs someone else > > to keep it and put. > > I'd always assumed this would exit cleanly, but perhaps there is a > latent race there. In any case, taking a module reference within the > module in this case is better than not doing so, as the latter would > potentially allow the module to be removed at any point in time, while > the former only seems to expose acquire and release gaps. Add it to > the todo list. Thanks, > > Alex Right. I agree with the stack overflow quote: as this example seems to show this is a latent bug. But I also agree that just removing the reference isn't the right way to clean it up. -- MST _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx