On 7/6/2022 9:10 PM, Alex Williamson wrote: > On Fri, 1 Jul 2022 16:38:11 +0530 > Abhishek Sahu <abhsahu@xxxxxxxxxx> wrote: > >> The vfio-pci based driver will have runtime power management >> support where the user can put the device into the low power state >> and then PCI devices can go into the D3cold state. If the device is >> in the low power state and the user issues any IOCTL, then the >> device should be moved out of the low power state first. Once >> the IOCTL is serviced, then it can go into the low power state again. >> The runtime PM framework manages this with help of usage count. >> >> One option was to add the runtime PM related API's inside vfio-pci >> driver but some IOCTL (like VFIO_DEVICE_FEATURE) can follow a >> different path and more IOCTL can be added in the future. Also, the >> runtime PM will be added for vfio-pci based drivers variant currently, >> but the other VFIO based drivers can use the same in the >> future. So, this patch adds the runtime calls runtime-related API in >> the top-level IOCTL function itself. >> >> For the VFIO drivers which do not have runtime power management >> support currently, the runtime PM API's won't be invoked. Only for >> vfio-pci based drivers currently, the runtime PM API's will be invoked >> to increment and decrement the usage count. > > Variant drivers can easily opt-out of runtime pm support by performing > a gratuitous pm-get in their device-open function. > Do I need to add this line in the commit message? >> Taking this usage count incremented while servicing IOCTL will make >> sure that the user won't put the device into low power state when any >> other IOCTL is being serviced in parallel. Let's consider the >> following scenario: >> >> 1. Some other IOCTL is called. >> 2. The user has opened another device instance and called the power >> management IOCTL for the low power entry. >> 3. The power management IOCTL moves the device into the low power state. >> 4. The other IOCTL finishes. >> >> If we don't keep the usage count incremented then the device >> access will happen between step 3 and 4 while the device has already >> gone into the low power state. >> >> The runtime PM API's should not be invoked for >> VFIO_DEVICE_FEATURE_POWER_MANAGEMENT since this IOCTL itself performs >> the runtime power management entry and exit for the VFIO device. > > I think the one-shot interface I proposed in the previous patch avoids > the need for special handling for these feature ioctls. Thanks, > Okay. So, for low power exit case (means feature GET ioctl in the updated case) also, we will trigger eventfd. Correct? Thanks, Abhishek > Alex > >> The pm_runtime_resume_and_get() will be the first call so its error >> should not be propagated to user space directly. For example, if >> pm_runtime_resume_and_get() can return -EINVAL for the cases where the >> user has passed the correct argument. So the >> pm_runtime_resume_and_get() errors have been masked behind -EIO. >> >> Signed-off-by: Abhishek Sahu <abhsahu@xxxxxxxxxx> >> --- >> drivers/vfio/vfio.c | 82 ++++++++++++++++++++++++++++++++++++++++----- >> 1 file changed, 74 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c >> index 61e71c1154be..61a8d9f7629a 100644 >> --- a/drivers/vfio/vfio.c >> +++ b/drivers/vfio/vfio.c >> @@ -32,6 +32,7 @@ >> #include <linux/vfio.h> >> #include <linux/wait.h> >> #include <linux/sched/signal.h> >> +#include <linux/pm_runtime.h> >> #include "vfio.h" >> >> #define DRIVER_VERSION "0.3" >> @@ -1333,6 +1334,39 @@ static const struct file_operations vfio_group_fops = { >> .release = vfio_group_fops_release, >> }; >> >> +/* >> + * Wrapper around pm_runtime_resume_and_get(). >> + * Return error code on failure or 0 on success. >> + */ >> +static inline int vfio_device_pm_runtime_get(struct vfio_device *device) >> +{ >> + struct device *dev = device->dev; >> + >> + if (dev->driver && dev->driver->pm) { >> + int ret; >> + >> + ret = pm_runtime_resume_and_get(dev); >> + if (ret < 0) { >> + dev_info_ratelimited(dev, >> + "vfio: runtime resume failed %d\n", ret); >> + return -EIO; >> + } >> + } >> + >> + return 0; >> +} >> + >> +/* >> + * Wrapper around pm_runtime_put(). >> + */ >> +static inline void vfio_device_pm_runtime_put(struct vfio_device *device) >> +{ >> + struct device *dev = device->dev; >> + >> + if (dev->driver && dev->driver->pm) >> + pm_runtime_put(dev); >> +} >> + >> /* >> * VFIO Device fd >> */ >> @@ -1607,6 +1641,8 @@ static int vfio_ioctl_device_feature(struct vfio_device *device, >> { >> size_t minsz = offsetofend(struct vfio_device_feature, flags); >> struct vfio_device_feature feature; >> + int ret = 0; >> + u16 feature_cmd; >> >> if (copy_from_user(&feature, arg, minsz)) >> return -EFAULT; >> @@ -1626,28 +1662,51 @@ static int vfio_ioctl_device_feature(struct vfio_device *device, >> (feature.flags & VFIO_DEVICE_FEATURE_GET)) >> return -EINVAL; >> >> - switch (feature.flags & VFIO_DEVICE_FEATURE_MASK) { >> + feature_cmd = feature.flags & VFIO_DEVICE_FEATURE_MASK; >> + >> + /* >> + * The VFIO_DEVICE_FEATURE_POWER_MANAGEMENT itself performs the runtime >> + * power management entry and exit for the VFIO device, so the runtime >> + * PM API's should not be called for this feature. >> + */ >> + if (feature_cmd != VFIO_DEVICE_FEATURE_POWER_MANAGEMENT) { >> + ret = vfio_device_pm_runtime_get(device); >> + if (ret) >> + return ret; >> + } >> + >> + switch (feature_cmd) { >> case VFIO_DEVICE_FEATURE_MIGRATION: >> - return vfio_ioctl_device_feature_migration( >> + ret = vfio_ioctl_device_feature_migration( >> device, feature.flags, arg->data, >> feature.argsz - minsz); >> + break; >> case VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE: >> - return vfio_ioctl_device_feature_mig_device_state( >> + ret = vfio_ioctl_device_feature_mig_device_state( >> device, feature.flags, arg->data, >> feature.argsz - minsz); >> + break; >> default: >> if (unlikely(!device->ops->device_feature)) >> - return -EINVAL; >> - return device->ops->device_feature(device, feature.flags, >> - arg->data, >> - feature.argsz - minsz); >> + ret = -EINVAL; >> + else >> + ret = device->ops->device_feature( >> + device, feature.flags, arg->data, >> + feature.argsz - minsz); >> + break; >> } >> + >> + if (feature_cmd != VFIO_DEVICE_FEATURE_POWER_MANAGEMENT) >> + vfio_device_pm_runtime_put(device); >> + >> + return ret; >> } >> >> static long vfio_device_fops_unl_ioctl(struct file *filep, >> unsigned int cmd, unsigned long arg) >> { >> struct vfio_device *device = filep->private_data; >> + int ret; >> >> switch (cmd) { >> case VFIO_DEVICE_FEATURE: >> @@ -1655,7 +1714,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep, >> default: >> if (unlikely(!device->ops->ioctl)) >> return -EINVAL; >> - return device->ops->ioctl(device, cmd, arg); >> + >> + ret = vfio_device_pm_runtime_get(device); >> + if (ret) >> + return ret; >> + >> + ret = device->ops->ioctl(device, cmd, arg); >> + vfio_device_pm_runtime_put(device); >> + return ret; >> } >> } >> >