On 7/26/2022 3:39 AM, Alex Williamson wrote: > On Mon, 25 Jul 2022 20:10:44 +0530 > Abhishek Sahu <abhsahu@xxxxxxxxxx> wrote: > >> On 7/22/2022 4:04 AM, Alex Williamson wrote: >>> On Tue, 19 Jul 2022 17:45:19 +0530 >>> Abhishek Sahu <abhsahu@xxxxxxxxxx> wrote: >>> >>>> This patch adds the following new device features for the low >>>> power entry and exit in the header file. The implementation for the >>>> same will be added in the subsequent patches. >>>> >>>> - VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY >>>> - VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP >>>> - VFIO_DEVICE_FEATURE_LOW_POWER_EXIT >>>> >>>> With the standard registers, all power states cannot be achieved. The >>> >>> We're talking about standard PCI PM registers here, let's make that >>> clear since we're adding a device agnostic interface here. >>> >>>> platform-based power management needs to be involved to go into the >>>> lowest power state. For doing low power entry and exit with >>>> platform-based power management, these device features can be used. >>>> >>>> The entry device feature has two variants. These two variants are mainly >>>> to support the different behaviour for the low power entry. >>>> If there is any access for the VFIO device on the host side, then the >>>> device will be moved out of the low power state without the user's >>>> guest driver involvement. Some devices (for example NVIDIA VGA or >>>> 3D controller) require the user's guest driver involvement for >>>> each low-power entry. In the first variant, the host can move the >>>> device into low power without any guest driver involvement while >>> >>> Perhaps, "In the first variant, the host can return the device to low >>> power automatically. The device will continue to attempt to reach low >>> power until the low power exit feature is called." >>> >>>> in the second variant, the host will send a notification to the user >>>> through eventfd and then the users guest driver needs to move >>>> the device into low power. >>> >>> "In the second variant, if the device exits low power due to an access, >>> the host kernel will signal the user via the provided eventfd and will >>> not return the device to low power without a subsequent call to one of >>> the low power entry features. A call to the low power exit feature is >>> optional if the user provided eventfd is signaled." >>> >>>> These device features only support VFIO_DEVICE_FEATURE_SET operation. >>> >>> And PROBE. >>> >> >> Thanks Alex. >> I will make the above changes in the commit message. >> >>>> Signed-off-by: Abhishek Sahu <abhsahu@xxxxxxxxxx> >>>> --- >>>> include/uapi/linux/vfio.h | 55 +++++++++++++++++++++++++++++++++++++++ >>>> 1 file changed, 55 insertions(+) >>>> >>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h >>>> index 733a1cddde30..08fd3482d22b 100644 >>>> --- a/include/uapi/linux/vfio.h >>>> +++ b/include/uapi/linux/vfio.h >>>> @@ -986,6 +986,61 @@ enum vfio_device_mig_state { >>>> VFIO_DEVICE_STATE_RUNNING_P2P = 5, >>>> }; >>>> >>>> +/* >>>> + * Upon VFIO_DEVICE_FEATURE_SET, move the VFIO device into the low power state >>>> + * with the platform-based power management. This low power state will be >>> >>> This is really "allow the device to be moved into a low power state" >>> rather than actually "move the device into" such a state though, right? >>> >> >> Yes. It will just allow the device to be moved into a low power state. >> I have addressed all your suggestions in the uAPI description and >> added the updated description in the last. >> >> Can you please check that once and check if it looks okay. >> >>>> + * internal to the VFIO driver and the user will not come to know which power >>>> + * state is chosen. If any device access happens (either from the host or >>>> + * the guest) when the device is in the low power state, then the host will >>>> + * move the device out of the low power state first. Once the access has been >>>> + * finished, then the host will move the device into the low power state again. >>>> + * If the user wants that the device should not go into the low power state >>>> + * again in this case, then the user should use the >>>> + * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device feature for the >>> >>> This should probably just read "For single shot low power support with >>> wake-up notification, see >>> VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP below." >>> >>>> + * low power entry. The mmap'ed region access is not allowed till the low power >>>> + * exit happens through VFIO_DEVICE_FEATURE_LOW_POWER_EXIT and will >>>> + * generate the access fault. >>>> + */ >>>> +#define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY 3 >>> >>> Note that Yishai's DMA logging set is competing for the same feature >>> entries. We'll need to coordinate. >>> >> >> Since this is last week of rc, so will it possible to consider the updated >> patches for next kernel (I can try to send them after complete testing the >> by the end of this week). Otherwise, I can wait for next kernel and then >> I can rebase my patches. > > I think we're close, though I would like to solicit uAPI feedback from > others on the list. We can certainly sort out feature numbers > conflicts at commit time if Yishai's series becomes ready as well. I'm > on PTO for the rest of the week starting tomorrow afternoon, but I'd > suggest trying to get this re-posted before the end of the week, asking > for more reviews for the uAPI, and we can evaluate early next week if > this is ready. > Sure. I will re-post the updated patch series after the complete testing. >>>> + >>>> +/* >>>> + * Upon VFIO_DEVICE_FEATURE_SET, move the VFIO device into the low power state >>>> + * with the platform-based power management and provide support for the wake-up >>>> + * notifications through eventfd. This low power state will be internal to the >>>> + * VFIO driver and the user will not come to know which power state is chosen. >>>> + * If any device access happens (either from the host or the guest) when the >>>> + * device is in the low power state, then the host will move the device out of >>>> + * the low power state first and a notification will be sent to the guest >>>> + * through eventfd. Once the access is finished, the host will not move back >>>> + * the device into the low power state. The guest should move the device into >>>> + * the low power state again upon receiving the wakeup notification. The >>>> + * notification will be generated only if the device physically went into the >>>> + * low power state. If the low power entry has been disabled from the host >>>> + * side, then the device will not go into the low power state even after >>>> + * calling this device feature and then the device access does not require >>>> + * wake-up. The mmap'ed region access is not allowed till the low power exit >>>> + * happens. The low power exit can happen either through >>>> + * VFIO_DEVICE_FEATURE_LOW_POWER_EXIT or through any other access (where the >>>> + * wake-up notification has been generated). >>> >>> Seems this could leverage a lot more from the previous, simply stating >>> that this has the same behavior as VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY >>> with the exception that the user provides an eventfd for notification >>> when the device resumes from low power and will not try to re-enter a >>> low power state without a subsequent user call to one of the low power >>> entry feature ioctls. It might also be worth covering the fact that >>> device accesses by the user, including region and ioctl access, will >>> also trigger the eventfd if that access triggers a resume. >>> >>> As I'm thinking about this, that latter clause is somewhat subtle. >>> AIUI a user can call the low power entry with wakeup feature and >>> proceed to do various ioctl and region (not mmap) accesses that could >>> perpetually keep the device awake, or there may be dependent devices >>> such that the device may never go to low power. It needs to be very >>> clear that only if the wakeup eventfd has the device entered into and >>> exited a low power state making the low power exit ioctl optional. >>> >> >> Yes. In my updated description, I have added more details. >> Can you please check if that helps. >> >>>> + */ >>>> +struct vfio_device_low_power_entry_with_wakeup { >>>> + __s32 wakeup_eventfd; >>>> + __u32 reserved; >>>> +}; >>>> + >>>> +#define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP 4 >>>> + >>>> +/* >>>> + * Upon VFIO_DEVICE_FEATURE_SET, move the VFIO device out of the low power >>>> + * state. >>> >>> Any ioctl effectively does that, the key here is that the low power >>> state may not be re-entered after this ioctl. >>> >>>> This device feature should be called only if the user has previously >>>> + * put the device into the low power state either with >>>> + * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY or >>>> + * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device feature. If the >>> >>> Doesn't really seem worth mentioning, we need to protect against misuse >>> regardless. >>> >>>> + * device is not in the low power state currently, this device feature will >>>> + * return early with the success status. >>> >>> This is an implementation detail, it doesn't need to be part of the >>> uAPI. Thanks, >>> >>> Alex >>> >>>> + */ >>>> +#define VFIO_DEVICE_FEATURE_LOW_POWER_EXIT 5 >>>> + >>>> /* -------- API for Type1 VFIO IOMMU -------- */ >>>> >>>> /** >>> >> >> /* >> * Upon VFIO_DEVICE_FEATURE_SET, allow the device to be moved into a low power >> * state with the platform-based power management. This low power state will >> * be internal to the VFIO driver and the user will not come to know which >> * power state is chosen. If any device access happens (either from the host > > Couldn't the user look in sysfs to determine the power state? There > might also be external hardware monitors of the power state, so this > statement is a bit overreaching. Maybe something along the lines of... > > "Device use of lower power states depend on factors managed by the > runtime power management core, including system level support and > coordinating support among dependent devices. Enabling device low > power entry does not guarantee lower power usage by the device, nor is > a mechanism provided through this feature to know the current power > state of the device." > >> * or the guest) when the device is in the low power state, then the host will > > Let's not confine ourselves to a VM use case, "...from the host or > through the vfio uAPI". > >> * move the device out of the low power state first. Once the access has been > > "move the device out of the low power state as necessary prior to the > access." > >> * finished, then the host will move the device into the low power state > > "Once the access is completed, the device may re-enter the low power > state." > >> * again. For single shot low power support with wake-up notification, see >> * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP below. The mmap'ed region >> * access is not allowed till the low power exit happens through >> * VFIO_DEVICE_FEATURE_LOW_POWER_EXIT and will generate the access fault. > > "Access to mmap'd device regions is disabled on LOW_POWER_ENTRY and > may only be resumed after calling LOW_POWER_EXIT." > >> */ >> #define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY 3 >> >> /* >> * This device feature has the same behavior as >> * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY with the exception that the user >> * provides an eventfd for wake-up notification. When the device moves out of >> * the low power state for the wake-up, the host will not try to re-enter a > > "...will not allow the device to re-enter a low power state..." > >> * low power state without a subsequent user call to one of the low power >> * entry device feature IOCTLs. The low power exit can happen either through >> * VFIO_DEVICE_FEATURE_LOW_POWER_EXIT or through any other access (where the >> * wake-up notification has been generated). >> * >> * The notification through eventfd will be generated only if the device has >> * gone into the low power state after calling this device feature IOCTL. > > "The notification through the provided eventfd will be generated only > when the device has entered and is resumed from a low power state after > calling this device feature IOCTL." > > >> * There are various cases where the device will not go into the low power >> * state after calling this device feature IOCTL (for example, the low power >> * entry has been disabled from the host side, the user keeps the device busy >> * after calling this device feature IOCTL, there are dependent devices which >> * block the device low power entry, etc.) and in such cases, the device access >> * does not require wake-up. Also, the low power exit through > > "A device that has not entered low power state, as managed through the > runtime power management core, will not generate a notification through > the provided eventfd on access." > >> * VFIO_DEVICE_FEATURE_LOW_POWER_EXIT is mandatory for the cases where the >> * wake-up notification has not been generated. > > "Calling the LOW_POWER_EXIT feature is optional in the case where > notification has been signaled on the provided eventfd that a resume > from low power has occurred." > > We should also reiterate the statement above about mmap access because > it would be reasonable for a user to assume that if any access wakes > the device, that should include mmap faults and therefore a resume > notification should occur for such an event. We could implement that > for the one-shot mode, but we're choosing not to. > >> */ >> struct vfio_device_low_power_entry_with_wakeup { >> __s32 wakeup_eventfd; >> __u32 reserved; >> }; >> >> #define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP 4 >> >> /* >> * Upon VFIO_DEVICE_FEATURE_SET, prevent the VFIO device low power state entry >> * which has been previously allowed with VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY >> * or VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device features. >> */ > > "Upon VFIO_DEVICE_FEATURE_SET, disallow use of device low power states > as previously enabled via VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY or > VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device features. This > device feature ioctl may itself generate a wakeup eventfd notification > in the latter case if the device has previously entered a low power > state." > > Thanks, > Alex > Thanks Alex for your thorough review of uAPI. I have incorporated all the suggestions. Following is the updated uAPI. /* * Upon VFIO_DEVICE_FEATURE_SET, allow the device to be moved into a low power * state with the platform-based power management. Device use of lower power * states depends on factors managed by the runtime power management core, * including system level support and coordinating support among dependent * devices. Enabling device low power entry does not guarantee lower power * usage by the device, nor is a mechanism provided through this feature to * know the current power state of the device. If any device access happens * (either from the host or through the vfio uAPI) when the device is in the * low power state, then the host will move the device out of the low power * state as necessary prior to the access. Once the access is completed, the * device may re-enter the low power state. For single shot low power support * with wake-up notification, see * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP below. Access to mmap'd * device regions is disabled on LOW_POWER_ENTRY and may only be resumed after * calling LOW_POWER_EXIT. */ #define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY 3 /* * This device feature has the same behavior as * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY with the exception that the user * provides an eventfd for wake-up notification. When the device moves out of * the low power state for the wake-up, the host will not allow the device to * re-enter a low power state without a subsequent user call to one of the low * power entry device feature IOCTLs. Access to mmap'd device regions is * disabled on LOW_POWER_ENTRY_WITH_WAKEUP and may only be resumed after the * low power exit. The low power exit can happen either through LOW_POWER_EXIT * or through any other access (where the wake-up notification has been * generated). The access to mmap'd device regions will not trigger low power * exit. * * The notification through the provided eventfd will be generated only when * the device has entered and is resumed from a low power state after * calling this device feature IOCTL. A device that has not entered low power * state, as managed through the runtime power management core, will not * generate a notification through the provided eventfd on access. Calling the * LOW_POWER_EXIT feature is optional in the case where notification has been * signaled on the provided eventfd that a resume from low power has occurred. */ struct vfio_device_low_power_entry_with_wakeup { __s32 wakeup_eventfd; __u32 reserved; }; #define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP 4 /* * Upon VFIO_DEVICE_FEATURE_SET, disallow use of device low power states as * previously enabled via VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY or * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device features. * This device feature IOCTL may itself generate a wakeup eventfd notification * in the latter case if the device has previously entered a low power state. */ #define VFIO_DEVICE_FEATURE_LOW_POWER_EXIT 5 Regards, Abhishek