On Tue, 26 Jul 2022 18:17:18 +0530 Abhishek Sahu <abhsahu@xxxxxxxxxx> wrote: > On 7/26/2022 3:39 AM, Alex Williamson wrote: > > On Mon, 25 Jul 2022 20:10:44 +0530 > > Abhishek Sahu <abhsahu@xxxxxxxxxx> wrote: > > > >> On 7/22/2022 4:04 AM, Alex Williamson wrote: > >>> On Tue, 19 Jul 2022 17:45:19 +0530 > >>> Abhishek Sahu <abhsahu@xxxxxxxxxx> wrote: > >>> > >>>> This patch adds the following new device features for the low > >>>> power entry and exit in the header file. The implementation for the > >>>> same will be added in the subsequent patches. > >>>> > >>>> - VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY > >>>> - VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP > >>>> - VFIO_DEVICE_FEATURE_LOW_POWER_EXIT > >>>> > >>>> With the standard registers, all power states cannot be achieved. The > >>> > >>> We're talking about standard PCI PM registers here, let's make that > >>> clear since we're adding a device agnostic interface here. > >>> > >>>> platform-based power management needs to be involved to go into the > >>>> lowest power state. For doing low power entry and exit with > >>>> platform-based power management, these device features can be used. > >>>> > >>>> The entry device feature has two variants. These two variants are mainly > >>>> to support the different behaviour for the low power entry. > >>>> If there is any access for the VFIO device on the host side, then the > >>>> device will be moved out of the low power state without the user's > >>>> guest driver involvement. Some devices (for example NVIDIA VGA or > >>>> 3D controller) require the user's guest driver involvement for > >>>> each low-power entry. In the first variant, the host can move the > >>>> device into low power without any guest driver involvement while > >>> > >>> Perhaps, "In the first variant, the host can return the device to low > >>> power automatically. The device will continue to attempt to reach low > >>> power until the low power exit feature is called." > >>> > >>>> in the second variant, the host will send a notification to the user > >>>> through eventfd and then the users guest driver needs to move > >>>> the device into low power. > >>> > >>> "In the second variant, if the device exits low power due to an access, > >>> the host kernel will signal the user via the provided eventfd and will > >>> not return the device to low power without a subsequent call to one of > >>> the low power entry features. A call to the low power exit feature is > >>> optional if the user provided eventfd is signaled." > >>> > >>>> These device features only support VFIO_DEVICE_FEATURE_SET operation. > >>> > >>> And PROBE. > >>> > >> > >> Thanks Alex. > >> I will make the above changes in the commit message. > >> > >>>> Signed-off-by: Abhishek Sahu <abhsahu@xxxxxxxxxx> > >>>> --- > >>>> include/uapi/linux/vfio.h | 55 +++++++++++++++++++++++++++++++++++++++ > >>>> 1 file changed, 55 insertions(+) > >>>> > >>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > >>>> index 733a1cddde30..08fd3482d22b 100644 > >>>> --- a/include/uapi/linux/vfio.h > >>>> +++ b/include/uapi/linux/vfio.h > >>>> @@ -986,6 +986,61 @@ enum vfio_device_mig_state { > >>>> VFIO_DEVICE_STATE_RUNNING_P2P = 5, > >>>> }; > >>>> > >>>> +/* > >>>> + * Upon VFIO_DEVICE_FEATURE_SET, move the VFIO device into the low power state > >>>> + * with the platform-based power management. This low power state will be > >>> > >>> This is really "allow the device to be moved into a low power state" > >>> rather than actually "move the device into" such a state though, right? > >>> > >> > >> Yes. It will just allow the device to be moved into a low power state. > >> I have addressed all your suggestions in the uAPI description and > >> added the updated description in the last. > >> > >> Can you please check that once and check if it looks okay. > >> > >>>> + * internal to the VFIO driver and the user will not come to know which power > >>>> + * state is chosen. If any device access happens (either from the host or > >>>> + * the guest) when the device is in the low power state, then the host will > >>>> + * move the device out of the low power state first. Once the access has been > >>>> + * finished, then the host will move the device into the low power state again. > >>>> + * If the user wants that the device should not go into the low power state > >>>> + * again in this case, then the user should use the > >>>> + * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device feature for the > >>> > >>> This should probably just read "For single shot low power support with > >>> wake-up notification, see > >>> VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP below." > >>> > >>>> + * low power entry. The mmap'ed region access is not allowed till the low power > >>>> + * exit happens through VFIO_DEVICE_FEATURE_LOW_POWER_EXIT and will > >>>> + * generate the access fault. > >>>> + */ > >>>> +#define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY 3 > >>> > >>> Note that Yishai's DMA logging set is competing for the same feature > >>> entries. We'll need to coordinate. > >>> > >> > >> Since this is last week of rc, so will it possible to consider the updated > >> patches for next kernel (I can try to send them after complete testing the > >> by the end of this week). Otherwise, I can wait for next kernel and then > >> I can rebase my patches. > > > > I think we're close, though I would like to solicit uAPI feedback from > > others on the list. We can certainly sort out feature numbers > > conflicts at commit time if Yishai's series becomes ready as well. I'm > > on PTO for the rest of the week starting tomorrow afternoon, but I'd > > suggest trying to get this re-posted before the end of the week, asking > > for more reviews for the uAPI, and we can evaluate early next week if > > this is ready. > > > > Sure. I will re-post the updated patch series after the complete testing. > > >>>> + > >>>> +/* > >>>> + * Upon VFIO_DEVICE_FEATURE_SET, move the VFIO device into the low power state > >>>> + * with the platform-based power management and provide support for the wake-up > >>>> + * notifications through eventfd. This low power state will be internal to the > >>>> + * VFIO driver and the user will not come to know which power state is chosen. > >>>> + * If any device access happens (either from the host or the guest) when the > >>>> + * device is in the low power state, then the host will move the device out of > >>>> + * the low power state first and a notification will be sent to the guest > >>>> + * through eventfd. Once the access is finished, the host will not move back > >>>> + * the device into the low power state. The guest should move the device into > >>>> + * the low power state again upon receiving the wakeup notification. The > >>>> + * notification will be generated only if the device physically went into the > >>>> + * low power state. If the low power entry has been disabled from the host > >>>> + * side, then the device will not go into the low power state even after > >>>> + * calling this device feature and then the device access does not require > >>>> + * wake-up. The mmap'ed region access is not allowed till the low power exit > >>>> + * happens. The low power exit can happen either through > >>>> + * VFIO_DEVICE_FEATURE_LOW_POWER_EXIT or through any other access (where the > >>>> + * wake-up notification has been generated). > >>> > >>> Seems this could leverage a lot more from the previous, simply stating > >>> that this has the same behavior as VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY > >>> with the exception that the user provides an eventfd for notification > >>> when the device resumes from low power and will not try to re-enter a > >>> low power state without a subsequent user call to one of the low power > >>> entry feature ioctls. It might also be worth covering the fact that > >>> device accesses by the user, including region and ioctl access, will > >>> also trigger the eventfd if that access triggers a resume. > >>> > >>> As I'm thinking about this, that latter clause is somewhat subtle. > >>> AIUI a user can call the low power entry with wakeup feature and > >>> proceed to do various ioctl and region (not mmap) accesses that could > >>> perpetually keep the device awake, or there may be dependent devices > >>> such that the device may never go to low power. It needs to be very > >>> clear that only if the wakeup eventfd has the device entered into and > >>> exited a low power state making the low power exit ioctl optional. > >>> > >> > >> Yes. In my updated description, I have added more details. > >> Can you please check if that helps. > >> > >>>> + */ > >>>> +struct vfio_device_low_power_entry_with_wakeup { > >>>> + __s32 wakeup_eventfd; > >>>> + __u32 reserved; > >>>> +}; > >>>> + > >>>> +#define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP 4 > >>>> + > >>>> +/* > >>>> + * Upon VFIO_DEVICE_FEATURE_SET, move the VFIO device out of the low power > >>>> + * state. > >>> > >>> Any ioctl effectively does that, the key here is that the low power > >>> state may not be re-entered after this ioctl. > >>> > >>>> This device feature should be called only if the user has previously > >>>> + * put the device into the low power state either with > >>>> + * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY or > >>>> + * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device feature. If the > >>> > >>> Doesn't really seem worth mentioning, we need to protect against misuse > >>> regardless. > >>> > >>>> + * device is not in the low power state currently, this device feature will > >>>> + * return early with the success status. > >>> > >>> This is an implementation detail, it doesn't need to be part of the > >>> uAPI. Thanks, > >>> > >>> Alex > >>> > >>>> + */ > >>>> +#define VFIO_DEVICE_FEATURE_LOW_POWER_EXIT 5 > >>>> + > >>>> /* -------- API for Type1 VFIO IOMMU -------- */ > >>>> > >>>> /** > >>> > >> > >> /* > >> * Upon VFIO_DEVICE_FEATURE_SET, allow the device to be moved into a low power > >> * state with the platform-based power management. This low power state will > >> * be internal to the VFIO driver and the user will not come to know which > >> * power state is chosen. If any device access happens (either from the host > > > > Couldn't the user look in sysfs to determine the power state? There > > might also be external hardware monitors of the power state, so this > > statement is a bit overreaching. Maybe something along the lines of... > > > > "Device use of lower power states depend on factors managed by the > > runtime power management core, including system level support and > > coordinating support among dependent devices. Enabling device low > > power entry does not guarantee lower power usage by the device, nor is > > a mechanism provided through this feature to know the current power > > state of the device." > > > >> * or the guest) when the device is in the low power state, then the host will > > > > Let's not confine ourselves to a VM use case, "...from the host or > > through the vfio uAPI". > > > >> * move the device out of the low power state first. Once the access has been > > > > "move the device out of the low power state as necessary prior to the > > access." > > > >> * finished, then the host will move the device into the low power state > > > > "Once the access is completed, the device may re-enter the low power > > state." > > > >> * again. For single shot low power support with wake-up notification, see > >> * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP below. The mmap'ed region > >> * access is not allowed till the low power exit happens through > >> * VFIO_DEVICE_FEATURE_LOW_POWER_EXIT and will generate the access fault. > > > > "Access to mmap'd device regions is disabled on LOW_POWER_ENTRY and > > may only be resumed after calling LOW_POWER_EXIT." > > > >> */ > >> #define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY 3 > >> > >> /* > >> * This device feature has the same behavior as > >> * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY with the exception that the user > >> * provides an eventfd for wake-up notification. When the device moves out of > >> * the low power state for the wake-up, the host will not try to re-enter a > > > > "...will not allow the device to re-enter a low power state..." > > > >> * low power state without a subsequent user call to one of the low power > >> * entry device feature IOCTLs. The low power exit can happen either through > >> * VFIO_DEVICE_FEATURE_LOW_POWER_EXIT or through any other access (where the > >> * wake-up notification has been generated). > >> * > >> * The notification through eventfd will be generated only if the device has > >> * gone into the low power state after calling this device feature IOCTL. > > > > "The notification through the provided eventfd will be generated only > > when the device has entered and is resumed from a low power state after > > calling this device feature IOCTL." > > > > > >> * There are various cases where the device will not go into the low power > >> * state after calling this device feature IOCTL (for example, the low power > >> * entry has been disabled from the host side, the user keeps the device busy > >> * after calling this device feature IOCTL, there are dependent devices which > >> * block the device low power entry, etc.) and in such cases, the device access > >> * does not require wake-up. Also, the low power exit through > > > > "A device that has not entered low power state, as managed through the > > runtime power management core, will not generate a notification through > > the provided eventfd on access." > > > >> * VFIO_DEVICE_FEATURE_LOW_POWER_EXIT is mandatory for the cases where the > >> * wake-up notification has not been generated. > > > > "Calling the LOW_POWER_EXIT feature is optional in the case where > > notification has been signaled on the provided eventfd that a resume > > from low power has occurred." > > > > We should also reiterate the statement above about mmap access because > > it would be reasonable for a user to assume that if any access wakes > > the device, that should include mmap faults and therefore a resume > > notification should occur for such an event. We could implement that > > for the one-shot mode, but we're choosing not to. > > > >> */ > >> struct vfio_device_low_power_entry_with_wakeup { > >> __s32 wakeup_eventfd; > >> __u32 reserved; > >> }; > >> > >> #define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP 4 > >> > >> /* > >> * Upon VFIO_DEVICE_FEATURE_SET, prevent the VFIO device low power state entry > >> * which has been previously allowed with VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY > >> * or VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device features. > >> */ > > > > "Upon VFIO_DEVICE_FEATURE_SET, disallow use of device low power states > > as previously enabled via VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY or > > VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device features. This > > device feature ioctl may itself generate a wakeup eventfd notification > > in the latter case if the device has previously entered a low power > > state." > > > > Thanks, > > Alex > > > > Thanks Alex for your thorough review of uAPI. > I have incorporated all the suggestions. > Following is the updated uAPI. > > /* > * Upon VFIO_DEVICE_FEATURE_SET, allow the device to be moved into a low power > * state with the platform-based power management. Device use of lower power > * states depends on factors managed by the runtime power management core, > * including system level support and coordinating support among dependent > * devices. Enabling device low power entry does not guarantee lower power > * usage by the device, nor is a mechanism provided through this feature to > * know the current power state of the device. If any device access happens > * (either from the host or through the vfio uAPI) when the device is in the > * low power state, then the host will move the device out of the low power > * state as necessary prior to the access. Once the access is completed, the > * device may re-enter the low power state. For single shot low power support > * with wake-up notification, see > * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP below. Access to mmap'd > * device regions is disabled on LOW_POWER_ENTRY and may only be resumed after > * calling LOW_POWER_EXIT. > */ > #define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY 3 > > /* > * This device feature has the same behavior as > * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY with the exception that the user > * provides an eventfd for wake-up notification. When the device moves out of > * the low power state for the wake-up, the host will not allow the device to > * re-enter a low power state without a subsequent user call to one of the low > * power entry device feature IOCTLs. Access to mmap'd device regions is > * disabled on LOW_POWER_ENTRY_WITH_WAKEUP and may only be resumed after the > * low power exit. The low power exit can happen either through LOW_POWER_EXIT > * or through any other access (where the wake-up notification has been > * generated). The access to mmap'd device regions will not trigger low power > * exit. > * > * The notification through the provided eventfd will be generated only when > * the device has entered and is resumed from a low power state after > * calling this device feature IOCTL. A device that has not entered low power > * state, as managed through the runtime power management core, will not > * generate a notification through the provided eventfd on access. Calling the > * LOW_POWER_EXIT feature is optional in the case where notification has been > * signaled on the provided eventfd that a resume from low power has occurred. > */ > struct vfio_device_low_power_entry_with_wakeup { > __s32 wakeup_eventfd; > __u32 reserved; > }; > > #define VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP 4 > > /* > * Upon VFIO_DEVICE_FEATURE_SET, disallow use of device low power states as > * previously enabled via VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY or > * VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device features. > * This device feature IOCTL may itself generate a wakeup eventfd notification > * in the latter case if the device has previously entered a low power state. > */ > #define VFIO_DEVICE_FEATURE_LOW_POWER_EXIT 5 Looks ok to me. Thanks, Alex