On 2024-10-11 at 12:01:48 +0200, Stephan Gerhold <stephan.gerhold@xxxxxxxxxx> wrote: > On Thu, Oct 10, 2024 at 09:42:46AM +0200, Johan Hovold wrote: >> When using the in-kernel pd-mapper on x1e80100, client drivers often >> fail to communicate with the firmware during boot, which specifically >> breaks battery and USB-C altmode notifications. This has been observed >> to happen on almost every second boot (41%) but likely depends on probe >> order: >> >> pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125) >> pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125 >> >> ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125 >> >> qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications >> >> In the same setup audio also fails to probe albeit much more rarely: >> >> PDR: avs/audio get domain list txn wait failed: -110 >> PDR: service lookup for avs/audio failed: -110 >> >> Chris Lew has provided an analysis and is working on a fix for the >> ECANCELED (125) errors, but it is not yet clear whether this will also >> address the audio regression. >> >> Even if this was first observed on x1e80100 there is currently no reason >> to believe that these issues are specific to that platform. >> >> Disable the in-kernel pd-mapper for now, and make sure to backport this >> to stable to prevent users and distros from migrating away from the >> user-space service. >> >> Fixes: 1ebcde047c54 ("soc: qcom: add pd-mapper implementation") >> Cc: stable@xxxxxxxxxxxxxxx # 6.11 >> Link: https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@xxxxxxxxxxxxxxxxxxxx/ >> Signed-off-by: Johan Hovold <johan+linaro@xxxxxxxxxx> >> --- >> >> It's now been over two months since I reported this regression, and even >> if we seem to be making some progress on at least some of these issues I >> think we need disable the pd-mapper temporarily until the fixes are in >> place (e.g. to prevent distros from dropping the user-space service). >> > > This is just a random thought, but I wonder if we could insert a delay > somewhere as temporary workaround to make the in-kernel pd-mapper more > reliable. I just tried replicating the userspace pd-mapper timing on > X1E80100 CRD by: > > 1. Disabling auto-loading of qcom_pd_mapper > (modprobe.blacklist=qcom_pd_mapper) > 2. Adding a systemd service that does nothing except running > "modprobe qcom_pd_mapper" at the same point in time where the > userspace pd-mapper would usually be started. Thank you so much for this idea. I'm currently using this workaround on my sdm845 device (where the in-kernel pd-mapper is breaking the out-of-tree call audio functionality). Is there any work going on on making the timing of the in-kernel pd-mapper more reliable? Cheers, Frank > This seems to work quite well for me, I haven't seen any of the > mentioned errors anymore in a couple of boot tests. Clearly, there is no > actual bug in the in-kernel pd-mapper, only worse timing. > > Thanks, > Stephan