On Mon, 28 Oct 2024 at 15:24, Konrad Dybcio <konradybcio@xxxxxxxxxx> wrote: > > Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through > CPU_SUSPEND instead. Inform Linux about that. > Please see the commit messages for a more detailed explanation. > > This is effectively a more educated follow-up to [1]. > > The ultimate goal is to stop making Linux think that certain states > only concern cores/clusters, and consequently setting > pm_set_suspend/resume_via_firmware(), so that client drivers (such as > NVMe, see related discussion over at [2]) can make informed decisions > about assuming the power state of the device they govern. In my opinion, this is not really the correct way to do it. Using pm_set_suspend/resume_via_firmware() works fine for x86/ACPI, but not for PSCI like this. Let me elaborate. If the NVMe storage device is sharing the same power-rail as the CPU cluster, then yes we should use PSCI to control it. But is that really the case? If so, there are in principle two ways forward to deal with this correctly. 1) If PSCI OSI mode is being used, the corresponding NVMe storage device should be hooked up to the CPU PM cluster domain via genpd and controlled as any other devices sharing the cluster-rail. In this way, genpd together with the cpuidle-psci-domain can decide whether it's okay to turn off the cluster. I believe this is the preferred way, but 2) would work fine too. 2) If PSCI PC mode is being used, a separate channel/interface to the FW (like SCMI or rpmh in the QC case), should inform the FW whether NVMe needs the power to it. This information should then be taken into account by the PSCI FW when it decides what low-power-state to enter, which ultimately means whether the cluster-rail can be turned off or not. Assuming PSCI OSI mode is used here. Then if 1) doesn't work for you, please elaborate on why, so we can help to make it work, as it should. [...] Kind regards Uffe