Re: [PATCH v5 5/5] vfio/pci: Implement VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/22/2022 4:04 AM, Alex Williamson wrote:
> On Tue, 19 Jul 2022 17:45:23 +0530
> Abhishek Sahu <abhsahu@xxxxxxxxxx> wrote:
> 
>> This patch implements VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP
>> device feature. In the VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY, if there is
>> any access for the VFIO device on the host side, then the device will
>> be moved out of the low power state without the user's guest driver
>> involvement. Once the device access has been finished, then the device
>> will be moved again into low power state. With the low power
>> entry happened through VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP,
>> the device will not be moved back into the low power state and
>> a notification will be sent to the user by triggering wakeup eventfd.
>>
>> vfio_pci_core_pm_entry() will be called for both the variants of low
>> power feature entry so add an extra argument for wakeup eventfd context
>> and store locally in 'struct vfio_pci_core_device'.
>>
>> For the entry happened without wakeup eventfd, all the exit related
>> handling will be done by the LOW_POWER_EXIT device feature only.
>> When the LOW_POWER_EXIT will be called, then the vfio core layer
>> vfio_device_pm_runtime_get() will increment the usage count and will
>> resume the device. In the driver runtime_resume callback,
>> the 'pm_wake_eventfd_ctx' will be NULL so the vfio_pci_runtime_pm_exit()
>> will return early. Then vfio_pci_core_pm_exit() will again call
>> vfio_pci_runtime_pm_exit() and now the exit related handling will be done.
>>
>> For the entry happened with wakeup eventfd, in the driver resume
>> callback, eventfd will be triggered and all the exit related handling will
>> be done. When vfio_pci_runtime_pm_exit() will be called by
>> vfio_pci_core_pm_exit(), then it will return early. But if the user has
>> disabled the runtime PM on the host side, the device will never go
>> runtime suspended state and in this case, all the exit related handling
>> will be done during vfio_pci_core_pm_exit() only. Also, the eventfd will
>> not be triggered since the device power state has not been changed by the
>> host driver.
>>
>> For vfio_pci_core_disable() also, all the exit related handling
>> needs to be done if user has closed the device after putting into
>> low power. In this case eventfd will not be triggered since
>> the device close has been initiated by the user only.
>>
>> Signed-off-by: Abhishek Sahu <abhsahu@xxxxxxxxxx>
>> ---
>>  drivers/vfio/pci/vfio_pci_core.c | 78 ++++++++++++++++++++++++++++++--
>>  include/linux/vfio_pci_core.h    |  1 +
>>  2 files changed, 74 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index 726a6f282496..dbe942bcaa67 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -259,7 +259,8 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat
>>  	return ret;
>>  }
>>  
>> -static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev)
>> +static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev,
>> +				     struct eventfd_ctx *efdctx)
>>  {
>>  	/*
>>  	 * The vdev power related flags are protected with 'memory_lock'
>> @@ -272,6 +273,7 @@ static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev)
>>  	}
>>  
>>  	vdev->pm_runtime_engaged = true;
>> +	vdev->pm_wake_eventfd_ctx = efdctx;
>>  	pm_runtime_put_noidle(&vdev->pdev->dev);
>>  	up_write(&vdev->memory_lock);
>>  
>> @@ -295,21 +297,67 @@ static int vfio_pci_core_pm_entry(struct vfio_device *device, u32 flags,
>>  	 * while returning from the ioctl and then the device can go into
>>  	 * runtime suspended state.
>>  	 */
>> -	return vfio_pci_runtime_pm_entry(vdev);
>> +	return vfio_pci_runtime_pm_entry(vdev, NULL);
>>  }
>>  
>> -static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev)
>> +static int
>> +vfio_pci_core_pm_entry_with_wakeup(struct vfio_device *device, u32 flags,
>> +				   void __user *arg, size_t argsz)
>> +{
>> +	struct vfio_pci_core_device *vdev =
>> +		container_of(device, struct vfio_pci_core_device, vdev);
>> +	struct vfio_device_low_power_entry_with_wakeup entry;
>> +	struct eventfd_ctx *efdctx;
>> +	int ret;
>> +
>> +	ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET,
>> +				 sizeof(entry));
>> +	if (ret != 1)
>> +		return ret;
>> +
>> +	if (copy_from_user(&entry, arg, sizeof(entry)))
>> +		return -EFAULT;
>> +
>> +	if (entry.wakeup_eventfd < 0)
>> +		return -EINVAL;
>> +
>> +	efdctx = eventfd_ctx_fdget(entry.wakeup_eventfd);
>> +	if (IS_ERR(efdctx))
>> +		return PTR_ERR(efdctx);
>> +
>> +	ret = vfio_pci_runtime_pm_entry(vdev, efdctx);
>> +	if (ret)
>> +		eventfd_ctx_put(efdctx);
>> +
>> +	return ret;
>> +}
>> +
>> +static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev,
>> +				     bool resume_callback)
>>  {
>>  	/*
>>  	 * The vdev power related flags are protected with 'memory_lock'
>>  	 * semaphore.
>>  	 */
>>  	down_write(&vdev->memory_lock);
>> +	if (resume_callback && !vdev->pm_wake_eventfd_ctx) {
>> +		up_write(&vdev->memory_lock);
>> +		return;
>> +	}
>> +
>>  	if (vdev->pm_runtime_engaged) {
>>  		vdev->pm_runtime_engaged = false;
>>  		pm_runtime_get_noresume(&vdev->pdev->dev);
>>  	}
>>  
>> +	if (vdev->pm_wake_eventfd_ctx) {
>> +		if (resume_callback)
>> +			eventfd_signal(vdev->pm_wake_eventfd_ctx, 1);
>> +
>> +		eventfd_ctx_put(vdev->pm_wake_eventfd_ctx);
>> +		vdev->pm_wake_eventfd_ctx = NULL;
>> +	}
>> +
>>  	up_write(&vdev->memory_lock);
>>  }
>>  
> 
> I find the pm_exit handling here confusing.  We only have one caller
> that can signal the eventfd, so it seems cleaner to me to have that
> caller do the eventfd signal.  We can then remove the arg to pm_exit
> and pull the core of it out to a pre-locked function for that call
> path.  Sometime like below (applies on top of this patch).  Also moved
> the intx unmasking until after the eventfd signaling.  What do you
> think?  Thanks,
> 
> Alex
> 

 Thanks Alex. The updated code looks cleaner.
 I will make the above changes.

 Regards,
 Abhishek

> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index dbe942bcaa67..93169b7d6da2 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -332,32 +332,27 @@ vfio_pci_core_pm_entry_with_wakeup(struct vfio_device *device, u32 flags,
>  	return ret;
>  }
>  
> -static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev,
> -				     bool resume_callback)
> +static void __vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev)
>  {
> -	/*
> -	 * The vdev power related flags are protected with 'memory_lock'
> -	 * semaphore.
> -	 */
> -	down_write(&vdev->memory_lock);
> -	if (resume_callback && !vdev->pm_wake_eventfd_ctx) {
> -		up_write(&vdev->memory_lock);
> -		return;
> -	}
> -
>  	if (vdev->pm_runtime_engaged) {
>  		vdev->pm_runtime_engaged = false;
>  		pm_runtime_get_noresume(&vdev->pdev->dev);
> -	}
> -
> -	if (vdev->pm_wake_eventfd_ctx) {
> -		if (resume_callback)
> -			eventfd_signal(vdev->pm_wake_eventfd_ctx, 1);
>  
> -		eventfd_ctx_put(vdev->pm_wake_eventfd_ctx);
> -		vdev->pm_wake_eventfd_ctx = NULL;
> +		if (vdev->pm_wake_eventfd_ctx) {
> +			eventfd_ctx_put(vdev->pm_wake_eventfd_ctx);
> +			vdev->pm_wake_eventfd_ctx = NULL;
> +		}
>  	}
> +}
>  
> +static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev)
> +{
> +	/*
> +	 * The vdev power related flags are protected with 'memory_lock'
> +	 * semaphore.
> +	 */
> +	down_write(&vdev->memory_lock);
> +	__vfio_pci_runtime_pm_exit(vdev);
>  	up_write(&vdev->memory_lock);
>  }
>  
> @@ -373,22 +368,13 @@ static int vfio_pci_core_pm_exit(struct vfio_device *device, u32 flags,
>  		return ret;
>  
>  	/*
> -	 * The device should already be resumed by the vfio core layer.
> -	 * vfio_pci_runtime_pm_exit() will internally increment the usage
> -	 * count corresponding to pm_runtime_put() called during low power
> -	 * feature entry.
> -	 *
> -	 * For the low power entry happened with wakeup eventfd, there will
> -	 * be two cases:
> -	 *
> -	 * 1. The device has gone into runtime suspended state. In this case,
> -	 *    the runtime resume by the vfio core layer should already have
> -	 *    performed all exit related handling and the
> -	 *    vfio_pci_runtime_pm_exit() will return early.
> -	 * 2. The device was in runtime active state. In this case, the
> -	 *    vfio_pci_runtime_pm_exit() will do all the required handling.
> +	 * The device is always in the active state here due to pm wrappers
> +	 * around ioctls.  If the device had entered a low power state and
> +	 * pm_wake_eventfd_ctx is valid, vfio_pci_core_runtime_resume() has 
> +	 * already signaled the eventfd and exited low power mode itself.
> +	 * pm_runtime_engaged protects the redundant call here.
>  	 */
> -	vfio_pci_runtime_pm_exit(vdev, false);
> +	vfio_pci_runtime_pm_exit(vdev);
>  	return 0;
>  }
>  
> @@ -425,15 +411,19 @@ static int vfio_pci_core_runtime_resume(struct device *dev)
>  {
>  	struct vfio_pci_core_device *vdev = dev_get_drvdata(dev);
>  
> -	if (vdev->pm_intx_masked)
> -		vfio_pci_intx_unmask(vdev);
> -
>  	/*
> -	 * Only for the low power entry happened with wakeup eventfd,
> -	 * the vfio_pci_runtime_pm_exit() will perform exit related handling
> -	 * and will trigger eventfd. For the other cases, it will return early.
> +	 * Resume with a pm_wake_eventfd_ctx signals the eventfd and exits
> +	 * low power mode.
>  	 */
> -	vfio_pci_runtime_pm_exit(vdev, true);
> +	down_write(&vdev->memory_lock);
> +	if (vdev->pm_wake_eventfd_ctx) {
> +		eventfd_signal(vdev->pm_wake_eventfd_ctx, 1);
> +		__vfio_pci_runtime_pm_exit(vdev);
> +	}
> +	up_write(&vdev->memory_lock);
> +
> +	if (vdev->pm_intx_masked)
> +		vfio_pci_intx_unmask(vdev);
>  
>  	return 0;
>  }
> @@ -553,7 +543,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
>  	 * the vfio_pci_set_power_state() will change the device power state
>  	 * to D0.
>  	 */
> -	vfio_pci_runtime_pm_exit(vdev, false);
> +	vfio_pci_runtime_pm_exit(vdev);
>  	pm_runtime_resume(&pdev->dev);
>  
>  	/*
> 




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux