On Fri, 1 Jul 2022 16:38:11 +0530 Abhishek Sahu <abhsahu@xxxxxxxxxx> wrote: > The vfio-pci based driver will have runtime power management > support where the user can put the device into the low power state > and then PCI devices can go into the D3cold state. If the device is > in the low power state and the user issues any IOCTL, then the > device should be moved out of the low power state first. Once > the IOCTL is serviced, then it can go into the low power state again. > The runtime PM framework manages this with help of usage count. > > One option was to add the runtime PM related API's inside vfio-pci > driver but some IOCTL (like VFIO_DEVICE_FEATURE) can follow a > different path and more IOCTL can be added in the future. Also, the > runtime PM will be added for vfio-pci based drivers variant currently, > but the other VFIO based drivers can use the same in the > future. So, this patch adds the runtime calls runtime-related API in > the top-level IOCTL function itself. > > For the VFIO drivers which do not have runtime power management > support currently, the runtime PM API's won't be invoked. Only for > vfio-pci based drivers currently, the runtime PM API's will be invoked > to increment and decrement the usage count. Variant drivers can easily opt-out of runtime pm support by performing a gratuitous pm-get in their device-open function. > Taking this usage count incremented while servicing IOCTL will make > sure that the user won't put the device into low power state when any > other IOCTL is being serviced in parallel. Let's consider the > following scenario: > > 1. Some other IOCTL is called. > 2. The user has opened another device instance and called the power > management IOCTL for the low power entry. > 3. The power management IOCTL moves the device into the low power state. > 4. The other IOCTL finishes. > > If we don't keep the usage count incremented then the device > access will happen between step 3 and 4 while the device has already > gone into the low power state. > > The runtime PM API's should not be invoked for > VFIO_DEVICE_FEATURE_POWER_MANAGEMENT since this IOCTL itself performs > the runtime power management entry and exit for the VFIO device. I think the one-shot interface I proposed in the previous patch avoids the need for special handling for these feature ioctls. Thanks, Alex > The pm_runtime_resume_and_get() will be the first call so its error > should not be propagated to user space directly. For example, if > pm_runtime_resume_and_get() can return -EINVAL for the cases where the > user has passed the correct argument. So the > pm_runtime_resume_and_get() errors have been masked behind -EIO. > > Signed-off-by: Abhishek Sahu <abhsahu@xxxxxxxxxx> > --- > drivers/vfio/vfio.c | 82 ++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 74 insertions(+), 8 deletions(-) > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c > index 61e71c1154be..61a8d9f7629a 100644 > --- a/drivers/vfio/vfio.c > +++ b/drivers/vfio/vfio.c > @@ -32,6 +32,7 @@ > #include <linux/vfio.h> > #include <linux/wait.h> > #include <linux/sched/signal.h> > +#include <linux/pm_runtime.h> > #include "vfio.h" > > #define DRIVER_VERSION "0.3" > @@ -1333,6 +1334,39 @@ static const struct file_operations vfio_group_fops = { > .release = vfio_group_fops_release, > }; > > +/* > + * Wrapper around pm_runtime_resume_and_get(). > + * Return error code on failure or 0 on success. > + */ > +static inline int vfio_device_pm_runtime_get(struct vfio_device *device) > +{ > + struct device *dev = device->dev; > + > + if (dev->driver && dev->driver->pm) { > + int ret; > + > + ret = pm_runtime_resume_and_get(dev); > + if (ret < 0) { > + dev_info_ratelimited(dev, > + "vfio: runtime resume failed %d\n", ret); > + return -EIO; > + } > + } > + > + return 0; > +} > + > +/* > + * Wrapper around pm_runtime_put(). > + */ > +static inline void vfio_device_pm_runtime_put(struct vfio_device *device) > +{ > + struct device *dev = device->dev; > + > + if (dev->driver && dev->driver->pm) > + pm_runtime_put(dev); > +} > + > /* > * VFIO Device fd > */ > @@ -1607,6 +1641,8 @@ static int vfio_ioctl_device_feature(struct vfio_device *device, > { > size_t minsz = offsetofend(struct vfio_device_feature, flags); > struct vfio_device_feature feature; > + int ret = 0; > + u16 feature_cmd; > > if (copy_from_user(&feature, arg, minsz)) > return -EFAULT; > @@ -1626,28 +1662,51 @@ static int vfio_ioctl_device_feature(struct vfio_device *device, > (feature.flags & VFIO_DEVICE_FEATURE_GET)) > return -EINVAL; > > - switch (feature.flags & VFIO_DEVICE_FEATURE_MASK) { > + feature_cmd = feature.flags & VFIO_DEVICE_FEATURE_MASK; > + > + /* > + * The VFIO_DEVICE_FEATURE_POWER_MANAGEMENT itself performs the runtime > + * power management entry and exit for the VFIO device, so the runtime > + * PM API's should not be called for this feature. > + */ > + if (feature_cmd != VFIO_DEVICE_FEATURE_POWER_MANAGEMENT) { > + ret = vfio_device_pm_runtime_get(device); > + if (ret) > + return ret; > + } > + > + switch (feature_cmd) { > case VFIO_DEVICE_FEATURE_MIGRATION: > - return vfio_ioctl_device_feature_migration( > + ret = vfio_ioctl_device_feature_migration( > device, feature.flags, arg->data, > feature.argsz - minsz); > + break; > case VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE: > - return vfio_ioctl_device_feature_mig_device_state( > + ret = vfio_ioctl_device_feature_mig_device_state( > device, feature.flags, arg->data, > feature.argsz - minsz); > + break; > default: > if (unlikely(!device->ops->device_feature)) > - return -EINVAL; > - return device->ops->device_feature(device, feature.flags, > - arg->data, > - feature.argsz - minsz); > + ret = -EINVAL; > + else > + ret = device->ops->device_feature( > + device, feature.flags, arg->data, > + feature.argsz - minsz); > + break; > } > + > + if (feature_cmd != VFIO_DEVICE_FEATURE_POWER_MANAGEMENT) > + vfio_device_pm_runtime_put(device); > + > + return ret; > } > > static long vfio_device_fops_unl_ioctl(struct file *filep, > unsigned int cmd, unsigned long arg) > { > struct vfio_device *device = filep->private_data; > + int ret; > > switch (cmd) { > case VFIO_DEVICE_FEATURE: > @@ -1655,7 +1714,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep, > default: > if (unlikely(!device->ops->ioctl)) > return -EINVAL; > - return device->ops->ioctl(device, cmd, arg); > + > + ret = vfio_device_pm_runtime_get(device); > + if (ret) > + return ret; > + > + ret = device->ops->ioctl(device, cmd, arg); > + vfio_device_pm_runtime_put(device); > + return ret; > } > } >