On Mon, 15 May 2023 at 20:33, Doug Anderson <dianders@xxxxxxxxxxxx> wrote: > > Hi, > > On Mon, May 15, 2023 at 7:42 AM Amit Pundir <amit.pundir@xxxxxxxxxx> wrote: > > > > On Sun, 14 May 2023 at 18:11, Caleb Connolly <caleb.connolly@xxxxxxxxxx> wrote: > > > > > > > > > > > > On 13/05/2023 18:08, Amit Pundir wrote: > > > > On Fri, 24 Mar 2023 at 19:05, Douglas Anderson <dianders@xxxxxxxxxxxx> wrote: > > > >> > > > >> This reverts commit 58973046c1bf ("regulator: qcom-rpmh: Use > > > >> PROBE_FORCE_SYNCHRONOUS"). Further digging into the problems that > > > >> prompted the us to switch to synchronous probe showed that the root > > > >> cause was a missing "rootwait" in the kernel command line > > > >> arguments. Let's reinstate asynchronous probe. > > > > > > > > Hi, the asynchronous probe is broken on Dragonboard 845c (SDM845) > > > > running AOSP (Android Open Source Project) with v6.4-rc1 > > > > https://bugs.linaro.org/show_bug.cgi?id=5975. > > > > Can we please go back to synchronous probe. > > > > > > > > AOSP do not make use of rootwait, IIRC, but it is added by the > > > > bootloader anyway. And the device fails to boot AOSP regardless of > > > > "rootwait" bootarg being present or not. > > > > > > Could you try applying this diff to enable some log spam and let me know > > > what you get? I'm keen to try and figure this one out. My mail client > > > might crunch this a bit so I have pasted it here too > > > https://p.calebs.dev/ab74b7@raw > > > > These prints add just enough delay for the UFS probe to succeed that I > > can't reproduce the failure anymore. > > I'd prefer doing at least a little debugging before jumping to a > revert. From looking at your dmesg [1], it looks as if the async probe > is allowing RPMH to probe at the same time as "qcom-vadc-common". > That's something that talks on the SPMI bus and is (potentially) > talking to the same PMICs that RPMH-regulator is, right? I'm by no > means an expert on how Qualcomm's PMICs work, but it seems plausible > that the "qcom-vadc-common" is somehow causing problems and screwing > up RPMH. Does that seem plausible to you? > > If so, one interesting way to track it down would be to move around > delays. Put ~500ms sleep at the _end_ of vadc_probe(). Presumably that > _won't_ fix the problem. Now put a ~500ms sleep at the beginning of > vadc_probe(). Maybe that will fix the problem? If so, you can move the > delay around to narrow down the conflict. My wild guess would be that > vadc_reset() could be throwing things for a loop? > > If the above doesn't work, maybe we could add more tracing / printouts > to see what is probing at the same time as RPMH? Tried out a few changes today but none of them worked or were effective enough to debug this crash further, other than setting fw_devlink=permissive. Adding more tracing / prints (BOOTTIME_TRACING and FUNCTION_GRAPH_TRACER) didn't work and didn't help in reproducing the crash either. They added just enough delay to boot the device successfully everytime. I tried to reason with the kernel modules which gets loaded before and after the qcom-rpmh-regulator (QCOM_REBOOT_MODE, QCOM_PON, IIO/VADC, SPMI_PMIC* etc) as suggested, but I run into the same crash even if I disable those driver modules. So I don't think that the other driver modules which gets loaded at around the same time as qcom-rpmh-regulator by default have any impact on this failure. The only way I can boot successfully everytime is if I boot with fw_devlink=permissive bootarg. So I'll have to check if there is any new dependency which got added recently in DT or somewhere else that is causing this breakage. Regards, Amit Pundir > > > [1] https://bugs.linaro.org/attachment.cgi?id=1135