On Tue, 19 Jul 2022 17:45:20 +0530 Abhishek Sahu <abhsahu@xxxxxxxxxx> wrote: > The vfio-pci based drivers will have runtime power management > support where the user can put the device into the low power state > and then PCI devices can go into the D3cold state. If the device is > in the low power state and the user issues any IOCTL, then the > device should be moved out of the low power state first. Once > the IOCTL is serviced, then it can go into the low power state again. > The runtime PM framework manages this with help of usage count. > > One option was to add the runtime PM related API's inside vfio-pci > driver but some IOCTL (like VFIO_DEVICE_FEATURE) can follow a > different path and more IOCTL can be added in the future. Also, the > runtime PM will be added for vfio-pci based drivers variant currently, > but the other VFIO based drivers can use the same in the > future. So, this patch adds the runtime calls runtime-related API in > the top-level IOCTL function itself. > > For the VFIO drivers which do not have runtime power management > support currently, the runtime PM API's won't be invoked. Only for > vfio-pci based drivers currently, the runtime PM API's will be invoked > to increment and decrement the usage count. In the vfio-pci drivers also, > the variant drivers can opt-out by incrementing the usage count during > device-open. The pm_runtime_resume_and_get() checks the device > current status and will return early if the device is already in the > ACTIVE state. > > Taking this usage count incremented while servicing IOCTL will make > sure that the user won't put the device into the low power state when any > other IOCTL is being serviced in parallel. Let's consider the > following scenario: > > 1. Some other IOCTL is called. > 2. The user has opened another device instance and called the IOCTL for > low power entry. > 3. The low power entry IOCTL moves the device into the low power state. > 4. The other IOCTL finishes. > > If we don't keep the usage count incremented then the device > access will happen between step 3 and 4 while the device has already > gone into the low power state. > > The pm_runtime_resume_and_get() will be the first call so its error > should not be propagated to user space directly. For example, if > pm_runtime_resume_and_get() can return -EINVAL for the cases where the > user has passed the correct argument. So the > pm_runtime_resume_and_get() errors have been masked behind -EIO. > > Signed-off-by: Abhishek Sahu <abhsahu@xxxxxxxxxx> > --- > drivers/vfio/vfio.c | 52 ++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 49 insertions(+), 3 deletions(-) > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c > index bd84ca7c5e35..1d005a0a9d3d 100644 > --- a/drivers/vfio/vfio.c > +++ b/drivers/vfio/vfio.c > @@ -32,6 +32,7 @@ > #include <linux/vfio.h> > #include <linux/wait.h> > #include <linux/sched/signal.h> > +#include <linux/pm_runtime.h> > #include "vfio.h" > > #define DRIVER_VERSION "0.3" > @@ -1335,6 +1336,39 @@ static const struct file_operations vfio_group_fops = { > .release = vfio_group_fops_release, > }; > > +/* > + * Wrapper around pm_runtime_resume_and_get(). > + * Return error code on failure or 0 on success. > + */ > +static inline int vfio_device_pm_runtime_get(struct vfio_device *device) > +{ > + struct device *dev = device->dev; > + > + if (dev->driver && dev->driver->pm) { > + int ret; > + > + ret = pm_runtime_resume_and_get(dev); > + if (ret < 0) { Nit, pm_runtime_resume_and_get() cannot return a positive value, it's either zero or -errno, so we could just test (ret). Thanks, Alex > + dev_info_ratelimited(dev, > + "vfio: runtime resume failed %d\n", ret); > + return -EIO; > + } > + } > + > + return 0; > +} > + > +/* > + * Wrapper around pm_runtime_put(). > + */ > +static inline void vfio_device_pm_runtime_put(struct vfio_device *device) > +{ > + struct device *dev = device->dev; > + > + if (dev->driver && dev->driver->pm) > + pm_runtime_put(dev); > +} > + > /* > * VFIO Device fd > */ > @@ -1649,15 +1683,27 @@ static long vfio_device_fops_unl_ioctl(struct file *filep, > unsigned int cmd, unsigned long arg) > { > struct vfio_device *device = filep->private_data; > + int ret; > + > + ret = vfio_device_pm_runtime_get(device); > + if (ret) > + return ret; > > switch (cmd) { > case VFIO_DEVICE_FEATURE: > - return vfio_ioctl_device_feature(device, (void __user *)arg); > + ret = vfio_ioctl_device_feature(device, (void __user *)arg); > + break; > + > default: > if (unlikely(!device->ops->ioctl)) > - return -EINVAL; > - return device->ops->ioctl(device, cmd, arg); > + ret = -EINVAL; > + else > + ret = device->ops->ioctl(device, cmd, arg); > + break; > } > + > + vfio_device_pm_runtime_put(device); > + return ret; > } > > static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,