On 7/22/2022 4:04 AM, Alex Williamson wrote: > On Tue, 19 Jul 2022 17:45:23 +0530 > Abhishek Sahu <abhsahu@xxxxxxxxxx> wrote: > >> This patch implements VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP >> device feature. In the VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY, if there is >> any access for the VFIO device on the host side, then the device will >> be moved out of the low power state without the user's guest driver >> involvement. Once the device access has been finished, then the device >> will be moved again into low power state. With the low power >> entry happened through VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP, >> the device will not be moved back into the low power state and >> a notification will be sent to the user by triggering wakeup eventfd. >> >> vfio_pci_core_pm_entry() will be called for both the variants of low >> power feature entry so add an extra argument for wakeup eventfd context >> and store locally in 'struct vfio_pci_core_device'. >> >> For the entry happened without wakeup eventfd, all the exit related >> handling will be done by the LOW_POWER_EXIT device feature only. >> When the LOW_POWER_EXIT will be called, then the vfio core layer >> vfio_device_pm_runtime_get() will increment the usage count and will >> resume the device. In the driver runtime_resume callback, >> the 'pm_wake_eventfd_ctx' will be NULL so the vfio_pci_runtime_pm_exit() >> will return early. Then vfio_pci_core_pm_exit() will again call >> vfio_pci_runtime_pm_exit() and now the exit related handling will be done. >> >> For the entry happened with wakeup eventfd, in the driver resume >> callback, eventfd will be triggered and all the exit related handling will >> be done. When vfio_pci_runtime_pm_exit() will be called by >> vfio_pci_core_pm_exit(), then it will return early. But if the user has >> disabled the runtime PM on the host side, the device will never go >> runtime suspended state and in this case, all the exit related handling >> will be done during vfio_pci_core_pm_exit() only. Also, the eventfd will >> not be triggered since the device power state has not been changed by the >> host driver. >> >> For vfio_pci_core_disable() also, all the exit related handling >> needs to be done if user has closed the device after putting into >> low power. In this case eventfd will not be triggered since >> the device close has been initiated by the user only. >> >> Signed-off-by: Abhishek Sahu <abhsahu@xxxxxxxxxx> >> --- >> drivers/vfio/pci/vfio_pci_core.c | 78 ++++++++++++++++++++++++++++++-- >> include/linux/vfio_pci_core.h | 1 + >> 2 files changed, 74 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c >> index 726a6f282496..dbe942bcaa67 100644 >> --- a/drivers/vfio/pci/vfio_pci_core.c >> +++ b/drivers/vfio/pci/vfio_pci_core.c >> @@ -259,7 +259,8 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat >> return ret; >> } >> >> -static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev) >> +static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev, >> + struct eventfd_ctx *efdctx) >> { >> /* >> * The vdev power related flags are protected with 'memory_lock' >> @@ -272,6 +273,7 @@ static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev) >> } >> >> vdev->pm_runtime_engaged = true; >> + vdev->pm_wake_eventfd_ctx = efdctx; >> pm_runtime_put_noidle(&vdev->pdev->dev); >> up_write(&vdev->memory_lock); >> >> @@ -295,21 +297,67 @@ static int vfio_pci_core_pm_entry(struct vfio_device *device, u32 flags, >> * while returning from the ioctl and then the device can go into >> * runtime suspended state. >> */ >> - return vfio_pci_runtime_pm_entry(vdev); >> + return vfio_pci_runtime_pm_entry(vdev, NULL); >> } >> >> -static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev) >> +static int >> +vfio_pci_core_pm_entry_with_wakeup(struct vfio_device *device, u32 flags, >> + void __user *arg, size_t argsz) >> +{ >> + struct vfio_pci_core_device *vdev = >> + container_of(device, struct vfio_pci_core_device, vdev); >> + struct vfio_device_low_power_entry_with_wakeup entry; >> + struct eventfd_ctx *efdctx; >> + int ret; >> + >> + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, >> + sizeof(entry)); >> + if (ret != 1) >> + return ret; >> + >> + if (copy_from_user(&entry, arg, sizeof(entry))) >> + return -EFAULT; >> + >> + if (entry.wakeup_eventfd < 0) >> + return -EINVAL; >> + >> + efdctx = eventfd_ctx_fdget(entry.wakeup_eventfd); >> + if (IS_ERR(efdctx)) >> + return PTR_ERR(efdctx); >> + >> + ret = vfio_pci_runtime_pm_entry(vdev, efdctx); >> + if (ret) >> + eventfd_ctx_put(efdctx); >> + >> + return ret; >> +} >> + >> +static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev, >> + bool resume_callback) >> { >> /* >> * The vdev power related flags are protected with 'memory_lock' >> * semaphore. >> */ >> down_write(&vdev->memory_lock); >> + if (resume_callback && !vdev->pm_wake_eventfd_ctx) { >> + up_write(&vdev->memory_lock); >> + return; >> + } >> + >> if (vdev->pm_runtime_engaged) { >> vdev->pm_runtime_engaged = false; >> pm_runtime_get_noresume(&vdev->pdev->dev); >> } >> >> + if (vdev->pm_wake_eventfd_ctx) { >> + if (resume_callback) >> + eventfd_signal(vdev->pm_wake_eventfd_ctx, 1); >> + >> + eventfd_ctx_put(vdev->pm_wake_eventfd_ctx); >> + vdev->pm_wake_eventfd_ctx = NULL; >> + } >> + >> up_write(&vdev->memory_lock); >> } >> > > I find the pm_exit handling here confusing. We only have one caller > that can signal the eventfd, so it seems cleaner to me to have that > caller do the eventfd signal. We can then remove the arg to pm_exit > and pull the core of it out to a pre-locked function for that call > path. Sometime like below (applies on top of this patch). Also moved > the intx unmasking until after the eventfd signaling. What do you > think? Thanks, > > Alex > Thanks Alex. The updated code looks cleaner. I will make the above changes. Regards, Abhishek > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > index dbe942bcaa67..93169b7d6da2 100644 > --- a/drivers/vfio/pci/vfio_pci_core.c > +++ b/drivers/vfio/pci/vfio_pci_core.c > @@ -332,32 +332,27 @@ vfio_pci_core_pm_entry_with_wakeup(struct vfio_device *device, u32 flags, > return ret; > } > > -static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev, > - bool resume_callback) > +static void __vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev) > { > - /* > - * The vdev power related flags are protected with 'memory_lock' > - * semaphore. > - */ > - down_write(&vdev->memory_lock); > - if (resume_callback && !vdev->pm_wake_eventfd_ctx) { > - up_write(&vdev->memory_lock); > - return; > - } > - > if (vdev->pm_runtime_engaged) { > vdev->pm_runtime_engaged = false; > pm_runtime_get_noresume(&vdev->pdev->dev); > - } > - > - if (vdev->pm_wake_eventfd_ctx) { > - if (resume_callback) > - eventfd_signal(vdev->pm_wake_eventfd_ctx, 1); > > - eventfd_ctx_put(vdev->pm_wake_eventfd_ctx); > - vdev->pm_wake_eventfd_ctx = NULL; > + if (vdev->pm_wake_eventfd_ctx) { > + eventfd_ctx_put(vdev->pm_wake_eventfd_ctx); > + vdev->pm_wake_eventfd_ctx = NULL; > + } > } > +} > > +static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev) > +{ > + /* > + * The vdev power related flags are protected with 'memory_lock' > + * semaphore. > + */ > + down_write(&vdev->memory_lock); > + __vfio_pci_runtime_pm_exit(vdev); > up_write(&vdev->memory_lock); > } > > @@ -373,22 +368,13 @@ static int vfio_pci_core_pm_exit(struct vfio_device *device, u32 flags, > return ret; > > /* > - * The device should already be resumed by the vfio core layer. > - * vfio_pci_runtime_pm_exit() will internally increment the usage > - * count corresponding to pm_runtime_put() called during low power > - * feature entry. > - * > - * For the low power entry happened with wakeup eventfd, there will > - * be two cases: > - * > - * 1. The device has gone into runtime suspended state. In this case, > - * the runtime resume by the vfio core layer should already have > - * performed all exit related handling and the > - * vfio_pci_runtime_pm_exit() will return early. > - * 2. The device was in runtime active state. In this case, the > - * vfio_pci_runtime_pm_exit() will do all the required handling. > + * The device is always in the active state here due to pm wrappers > + * around ioctls. If the device had entered a low power state and > + * pm_wake_eventfd_ctx is valid, vfio_pci_core_runtime_resume() has > + * already signaled the eventfd and exited low power mode itself. > + * pm_runtime_engaged protects the redundant call here. > */ > - vfio_pci_runtime_pm_exit(vdev, false); > + vfio_pci_runtime_pm_exit(vdev); > return 0; > } > > @@ -425,15 +411,19 @@ static int vfio_pci_core_runtime_resume(struct device *dev) > { > struct vfio_pci_core_device *vdev = dev_get_drvdata(dev); > > - if (vdev->pm_intx_masked) > - vfio_pci_intx_unmask(vdev); > - > /* > - * Only for the low power entry happened with wakeup eventfd, > - * the vfio_pci_runtime_pm_exit() will perform exit related handling > - * and will trigger eventfd. For the other cases, it will return early. > + * Resume with a pm_wake_eventfd_ctx signals the eventfd and exits > + * low power mode. > */ > - vfio_pci_runtime_pm_exit(vdev, true); > + down_write(&vdev->memory_lock); > + if (vdev->pm_wake_eventfd_ctx) { > + eventfd_signal(vdev->pm_wake_eventfd_ctx, 1); > + __vfio_pci_runtime_pm_exit(vdev); > + } > + up_write(&vdev->memory_lock); > + > + if (vdev->pm_intx_masked) > + vfio_pci_intx_unmask(vdev); > > return 0; > } > @@ -553,7 +543,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev) > * the vfio_pci_set_power_state() will change the device power state > * to D0. > */ > - vfio_pci_runtime_pm_exit(vdev, false); > + vfio_pci_runtime_pm_exit(vdev); > pm_runtime_resume(&pdev->dev); > > /* >