On Mon, Dec 09, 2024 at 02:36:06PM +0100, Christoph Hellwig wrote: > On Thu, Dec 05, 2024 at 07:49:34PM -0600, Bjorn Helgaas wrote: > > Oops, I think I got this part backwards. The patch uses PCI PM if > > d3cold_allowed is set, and it's set by default, so it does what you > > need for the Qualcomm platform *without* user intervention. > > > > But I guess using the flag allows users in other situations to force > > use of NVMe power management by clearing d3cold_allowed via sysfs. > > Does that mean some unspecified other platforms might only work > > correctly with that user intervention? > > Still seems awkward to overload fields like this. > > The istory here is the the NVMe internal power states are significantly > better for the SSDs. It avoid shutting down the SSD frequently, which > creates a lot of extra erase cycles and reduces life time. It also > prevents the SSD from performing maintainance operations while the host > system is idle, which is the perfect time for them. But the idea of > putting all periphals into D3 is gaining a lot of ground because it > makes the platform vendors life a lot simpler at the cost of others. No, I disagree with the last comment. When the system goes to low power mode (like S2R/hibernate), it *does* makes a lot of sense to put the devices into D3Cold to save power. Using NVMe or other endpoint devices internal power states only matters when the system is idle (which is not S2R/hibernate). This is what ACPI is doing currently. Then one might suggest to check the suspend state using 'pm_suspend_target_state' in device drivers and decide to keep the devices in D3Cold. Unfortunately, it won't work for Qcom platforms as most of them (except chromebooks) don't have S2R (a.k.a PSCI_SYSTEM_SUSPEND), but only S2Idle. When it comes to non-Qcom platforms based on devicetree, they also cannot power down the NVMe device during suspend (as they cannot satisfy existing checks in nvme_suspend()). The current reality is that most of the devicetree platforms *do* want to power down the NVMe during suspend. Only when NVMe is used in an OS like Android, we might not want to do so (that's something for future, but still a possibility). So this is how I ended up using the existing 'd3cold_allowed' attribute as it allows the devices to be powered down by default and if the OS doesn't want to do so, it can tweak the attribute from userspace (similar to UFS in Android). The flexibility of this attribute is that it just acts as a hint for the device drivers. If the device driver doesn't want to do D3Cold (when used as a wakeup source etc...) it can still opt out (assuming that the platform would also honor the wakeup capability of the device). - Mani -- மணிவண்ணன் சதாசிவம்