On 5/18/2022 11:21 PM, Alex Williamson wrote: > On Wed, 18 May 2022 16:46:08 +0530 > Abhishek Sahu <abhsahu@xxxxxxxxxx> wrote: > >> Currently, there is very limited power management support available >> in the upstream vfio-pci driver. If there is no user of vfio-pci device, >> then it will be moved into D3Hot state. Similarly, if we enable the >> runtime power management for vfio-pci device in the guest OS, then the >> device is being runtime suspended (for linux guest OS) and the PCI >> device will be put into D3hot state (in function >> vfio_pm_config_write()). If the D3cold state can be used instead of >> D3hot, then it will help in saving maximum power. The D3cold state can't >> be possible with native PCI PM. It requires interaction with platform >> firmware which is system-specific. To go into low power states >> (including D3cold), the runtime PM framework can be used which >> internally interacts with PCI and platform firmware and puts the device >> into the lowest possible D-States. >> >> This patch series registers the vfio-pci driver with runtime >> PM framework and uses the same for moving the physical PCI >> device to go into the low power state for unused idle devices. >> There will be separate patch series that will add the support >> for using runtime PM framework for used idle devices. >> >> The current PM support was added with commit 6eb7018705de ("vfio-pci: >> Move idle devices to D3hot power state") where the following point was >> mentioned regarding D3cold state. >> >> "It's tempting to try to use D3cold, but we have no reason to inhibit >> hotplug of idle devices and we might get into a loop of having the >> device disappear before we have a chance to try to use it." >> >> With the runtime PM, if the user want to prevent going into D3cold then >> /sys/bus/pci/devices/.../d3cold_allowed can be set to 0 for the >> devices where the above functionality is required instead of >> disallowing the D3cold state for all the cases. >> >> The BAR access needs to be disabled if device is in D3hot state. >> Also, there should not be any config access if device is in D3cold >> state. For SR-IOV, the PF power state should be higher than VF's power >> state. >> >> * Changes in v5 >> >> - Rebased over https://github.com/awilliam/linux-vfio/tree/next. >> - Renamed vfio_pci_lock_and_set_power_state() to >> vfio_lock_and_set_power_state() and made it static. >> - Inside vfio_pci_core_sriov_configure(), protected setting of >> power state and sriov enablement with 'memory_lock'. >> - Removed CONFIG_PM macro use since it is not needed with current >> code. > > Applied to vfio next branch for v5.19. Thanks! > > Alex > Thanks Alex for your thorough review and support in getting this series merged. I will start exploring for the second part and will find out a generic way to support all the use cases. Regards, Abhishek >> * Changes in v4 >> (https://lore.kernel.org/lkml/20220517100219.15146-1-abhsahu@xxxxxxxxxx) >> >> - Rebased over https://github.com/awilliam/linux-vfio/tree/next. >> - Split the patch series into 2 parts. This part contains the patches >> for using runtime PM for unused idle device. >> - Used the 'pdev->current_state' for checking if the device in D3 state. >> - Adds the check in __vfio_pci_memory_enabled() function itself instead >> of adding power state check at each caller. >> - Make vfio_pci_lock_and_set_power_state() global since it is needed >> in different files. >> - Used vfio_pci_lock_and_set_power_state() instead of >> vfio_pci_set_power_state() before pci_enable_sriov(). >> - Inside vfio_pci_core_sriov_configure(), handled both the cases >> (the device is in low power state with and without user). >> - Used list_for_each_entry_continue_reverse() in >> vfio_pci_dev_set_pm_runtime_get(). >> >> * Changes in v3 >> (https://lore.kernel.org/lkml/20220425092615.10133-1-abhsahu@xxxxxxxxxx) >> >> - Rebased patches on v5.18-rc3. >> - Marked this series as PATCH instead of RFC. >> - Addressed the review comments given in v2. >> - Removed the limitation to keep device in D0 state if there is any >> access from host side. This is specific to NVIDIA use case and >> will be handled separately. >> - Used the existing DEVICE_FEATURE IOCTL itself instead of adding new >> IOCTL for power management. >> - Removed all custom code related with power management in runtime >> suspend/resume callbacks and IOCTL handling. Now, the callbacks >> contain code related with INTx handling and few other stuffs and >> all the PCI state and platform PM handling will be done by PCI core >> functions itself. >> - Add the support of wake-up in main vfio layer itself since now we have >> more vfio/pci based drivers. >> - Instead of assigning the 'struct dev_pm_ops' in individual parent >> driver, now the vfio_pci_core tself assigns the 'struct dev_pm_ops'. >> - Added handling of power management around SR-IOV handling. >> - Moved the setting of drvdata in a separate patch. >> - Masked INTx before during runtime suspended state. >> - Changed the order of patches so that Fix related things are at beginning >> of this patch series. >> - Removed storing the power state locally and used one new boolean to >> track the d3 (D3cold and D3hot) power state >> - Removed check for IO access in D3 power state. >> - Used another helper function vfio_lock_and_set_power_state() instead >> of touching vfio_pci_set_power_state(). >> - Considered the fixes made in >> https://lore.kernel.org/lkml/20220217122107.22434-1-abhsahu@xxxxxxxxxx >> and updated the patches accordingly. >> >> * Changes in v2 >> (https://lore.kernel.org/lkml/20220124181726.19174-1-abhsahu@xxxxxxxxxx) >> >> - Rebased patches on v5.17-rc1. >> - Included the patch to handle BAR access in D3cold. >> - Included the patch to fix memory leak. >> - Made a separate IOCTL that can be used to change the power state from >> D3hot to D3cold and D3cold to D0. >> - Addressed the review comments given in v1. >> >> * v1 >> https://lore.kernel.org/lkml/20211115133640.2231-1-abhsahu@xxxxxxxxxx/ >> >> Abhishek Sahu (4): >> vfio/pci: Invalidate mmaps and block the access in D3hot power state >> vfio/pci: Change the PF power state to D0 before enabling VFs >> vfio/pci: Virtualize PME related registers bits and initialize to zero >> vfio/pci: Move the unused device into low power state with runtime PM >> >> drivers/vfio/pci/vfio_pci_config.c | 56 ++++++++- >> drivers/vfio/pci/vfio_pci_core.c | 178 ++++++++++++++++++++--------- >> 2 files changed, 178 insertions(+), 56 deletions(-) >> >