On Mon, Dec 16, 2024 at 6:11 PM Manivannan Sadhasivam <manivannan.sadhasivam@xxxxxxxxxx> wrote: > > On Mon, Dec 16, 2024 at 05:24:40PM +0100, Rafael J. Wysocki wrote: > > On Sat, Dec 14, 2024 at 7:30 AM Manivannan Sadhasivam > > <manivannan.sadhasivam@xxxxxxxxxx> wrote: > > > > > > On Fri, Dec 13, 2024 at 03:35:15PM +0100, Rafael J. Wysocki wrote: > > > > On Thu, Dec 12, 2024 at 4:14 PM Christoph Hellwig <hch@xxxxxx> wrote: > > > > > > > > > > On Thu, Dec 12, 2024 at 01:49:15PM +0100, Ulf Hansson wrote: > > > > > > Right. This seems to somewhat work for ACPI types of systems, because > > > > > > ACPI is controlling the low power state for all the devices. Based on > > > > > > the requested system wide low power state, ACPI can then decide to > > > > > > call pm_set_suspend_via_firmware() or not. > > > > > > > > > > > > Still there is a problem with this for ACPI too. > > > > > > > > > > > > How does ACPI know whether it's actually a good idea to keep the NVMe > > > > > > storage powered in s2idle (ACPI calls pm_set_suspend_via_firmware() > > > > > > only for S2R and S2disk!?)? Especially when my laptop only supports > > > > > > s2idle and that's what I will use when I close the lid. In this way, > > > > > > the NMVe storage will certainly contribute to draining the battery, > > > > > > especially when I won't be using my laptop for a couple of days. > > > > > > > > > > > > In my opinion, we need a better approach that is both flexible and > > > > > > that dynamically adjusts based upon the use case. > > > > > > > > > > Agreed. I'd be happy to work with the PM maintainers to do this, > > > > > but I don't really know enough about the PM core to drive it > > > > > (as the reply from Rafael to my mail makes pretty clear :)) > > > > > > > > I'm here to help. > > > > > > > > Let me know what exactly you want to achieve and we'll see how to make it work. > > > > > > I'll try to summarize the requirement here since I started this thread: > > > > > > Problem statement > > > ================= > > > > > > We need a PM core API that tells the device drivers when it is safe to powerdown > > > the devices. The usecase here is with PCIe based NVMe devices but the problem is > > > applicable to other devices as well. > > > > > > Drivers are relying on couple of options now: > > > > > > 1. If pm_suspend_via_firmware() returns true, then drivers will shutdown the > > > device assuming that the firmware is going to handle the suspend. But this API > > > is currently used only by ACPI. Even there, ACPI relies on S2R being supported > > > by the platform and it sets pm_set_suspend_via_firmware() only when the suspend > > > is S2R. But if the platform doesn't support S2R (current case of most of the > > > Qcom SoCs), then pm_suspend_via_firmware() will return false and NVMe won't be > > > powered down draining the battery. > > > > So my question here would be why is it not powered down always during > > system-wide suspend? > > > > Why exactly is it necessary to distinguish one case from the other > > (assuming that we are talking about system-wide suspend only)? > > > > To support Android like usecase with firmware that only supports > suspend-to-idle (Qcom platforms). This usecase is not applicable right now, but > one can't just rule out the possibility in the near future. This doesn't explain anything to me, sorry. > And the problem is that we need to support both Android and non-Android systems > with same firmware :/ So what technically is the problem? > > There are drivers that use pm_suspend_via_firmware() to check whether > > or not something special needs to be done to the device because if > > "false" is returned, the platform firmware is not going to remove > > power from it. > > > > However, you seem to be talking about the opposite, so doing something > > special to the device if "true" is returned. I'm not sure why this is > > necessary. > > > > Because, since 'false' is returned, drivers like NVMe are assuming that the > platform won't remove power on all DT systems and they just keep the devices in > low power state (not powering them down). But why would I want my NVMe in DT > based laptop to be always powered in system suspend? Because it causes the system to use less energy when suspended. > > > If the platform is using DT, then there is no entity setting > > > pm_set_suspend_via_firmware(). > > > > That's true and so the assumption is that in this case the handling of > > all devices will always be the same regardless of which flavor of > > system suspend is chosen by user space. > > > > Right and that's why the above concern is raised. And it is still unclear to me what the problem with it is. What exactly can go wrong? > > > So NVMe will be kept in low power state all the > > > time (still draining the battery). > > > > So what would be the problem with powering it down unconditionally? > > > > I'm not sure how would you do that (by checking dev_of_node()?). Even so, it > will wear out the NVMe devices if used in Android tablets etc... I understand the wear-out concern. Is there anything else? > > > There were attempts to set this flag from > > > PSCI [1], but there were objections on setting this flag when PSCI_SUSPEND is > > > not supported by the platform (again, the case with Qcom SoCs). Even if this > > > approach succeeds, then there are concerns that if the platform is used in an > > > OS like Android where the S2Idle cycle is far more high, NVMe will wear out > > > very quickly. > > > > I see. > > > > > So this is where the forthcoming API need to "dynamically adjusts > > > based upon the use case" as quoted by Ulf in his previous reply. One way to > > > achieve would be by giving the flexibility to the userspace to choose the > > > suspend state (if platform has options to select). UFS does something similar > > > with 'spm_lvl' [2] sysfs attribute that I believe Android userspace itself makes > > > use of. > > > > Before we're talking about APIs, let's talk about the desired behavior. > > > > It looks like there are cases in which you'd want to turn the device > > off completely (say put it into D3cold in the PCI terminology) and > > there are cases in which you'd want it to stay in a somewhat-powered > > low-power state. > > > > It is unclear to me what they are at this point. > > > > I hope that my above explanation clarifies. Sorry, but not really. > Here is the short version of the suspend requirement across platforms: > > 1. D3Cold (power down) - Laptops/Automotive > 2. D3hot (low power) - Android Tablets Where do the above requirements come from? > FWIW, I did receive feedback from people asking to just ignore the Android > usecase and always power down the devices for DT platforms. But I happen to > disagree with them. Let me know if I was wrong and I should not worry about > Android usecase as it is for the future. I'm not sure what you mean by the "Android usecase" TBH. Do you mean the wear-out concern in the Android usage scenario or is there more to it?