Hi, On Fri, Jun 2, 2023 at 6:13 AM Amit Pundir <amit.pundir@xxxxxxxxxx> wrote: > > On Fri, 2 Jun 2023 at 18:07, Mark Brown <broonie@xxxxxxxxxx> wrote: > > > > > > If you reorder the nodes in the device tree, I think it'll change the > > > > probe order. Does that affect anything? I'm wondering if there's some > > > > sort of delayed reaction from a previous regulator. > > > > > Hi, Bumping lvs1 and lvs2 regulators up to the top of the list in the > > > DTS https://bugs.linaro.org/show_bug.cgi?id=5975#c4 does seem to work. > > > I can't reproduce the crash in 125 reboots so far, while I'm still > > > testing with only qcom-rpmh-regulator kernel module. I'll do some more > > > testing with full system running and send this re-ordering fix I can't > > > reproduce the crash further. > > > > So whatever the issue is here it's a timing/race condition - this seems > > like a workaround which works just now but it's not getting to whatever > > the actual issue is and that could come back. > > Hi, I'm happy to debug this issue further or test run any > patches/ideas if that helps. I guess it's a better workaround than reverting the async patch. ...at least it has a chance of having a real effect. That being said, it's still a bit of a hack. Ideally you'd be able to somehow get information about RPMH's state when things fail. Maybe this might be possible with ramdumps or maybe with JTAG. Unfortunately, I'm not super familiar with either of them. I assume you aren't either or you would have already tried them... >From a black box perspective, I guess the things I could think of would be to keep poking around with things that you control. Best ideas I have: 1. Use "bisect" style techniques to figure out how much you really need to move the "lvs" regulators. Try moving it halfway up the list. If that works, move it closer to the bottom. If that doesn't work, move it closer to the top. Eventually you'd find out which regulator it's important to be before. 2. Try adding some delays to some of the regulators with "regulator-enable-ramp-delay" and/or "regulator-settling-time-us". Without a scope, it'll be tricky to figure out exactly which regulators might need delays, but you could at least confirm if the "overkill" approach of having all the regulators have some delay helps... I guess you could also try putting a big delay for "ldo26". If that works, you could try moving it up (again using a bisect style approach) to see where the delay matters? Currently, I guess my mental model of what might be going wrong is that regulators are all turning on / adjusting really quickly. Maybe they aren't switching into "high power mode" quickly enough, maybe they are busy ramping up or down, or maybe there's simply some other issue, but I suppose that something happening could be causing the voltage to droop down (or be too high) and then that's making RPMH upset. Changing the order could be helping avoid this droop, but the more proper fix would be to actually account for it with regulator constraints. My mental model still doesn't really explain what async probe has to do with anything, at least if you're loading _just_ rpmh-regulator all on its own. Assuming you're loading rpmh-regulator all on its own then async probe should do almost nothing. Unless I'm mistaken, async probe won't cause all the RPMH regulators to initialize in parallel. They still initialize synchronously in the rpmh-regulator probe routine. Async probe would _just_ allow some other driver to run its probe at the same time. ...but if no other drivers being loaded at the same time then I'm perplexed about how it could make any difference. -Doug