On 15/06/2023 18:09, Amit Pundir wrote: > On Thu, 15 Jun 2023 at 20:33, Krzysztof Kozlowski > <krzysztof.kozlowski@xxxxxxxxxx> wrote: >> >> On 15/06/2023 15:47, Amit Pundir wrote: >>> On Thu, 15 Jun 2023 at 00:38, Amit Pundir <amit.pundir@xxxxxxxxxx> wrote: >>>> >>>> On Thu, 15 Jun 2023 at 00:17, Krzysztof Kozlowski >>>> <krzysztof.kozlowski@xxxxxxxxxx> wrote: >>>>> >>>>> On 14/06/2023 20:18, Linux regression tracking (Thorsten Leemhuis) wrote: >>>>>> On 02.06.23 18:12, Amit Pundir wrote: >>>>>>> Move lvs1 and lvs2 regulator nodes up in the rpmh-regulators >>>>>>> list to workaround a boot regression uncovered by the upstream >>>>>>> commit ad44ac082fdf ("regulator: qcom-rpmh: Revert "regulator: >>>>>>> qcom-rpmh: Use PROBE_FORCE_SYNCHRONOUS""). >>>>>>> >>>>>>> Without this fix DB845c fail to boot at times because one of the >>>>>>> lvs1 or lvs2 regulators fail to turn ON in time. >>>>>> >>>>>> /me waves friendly >>>>>> >>>>>> FWIW, as it's not obvious: this... >>>>>> >>>>>>> Link: https://lore.kernel.org/all/CAMi1Hd1avQDcDQf137m2auz2znov4XL8YGrLZsw5edb-NtRJRw@xxxxxxxxxxxxxx/ >>>>>> >>>>>> ...is a report about a regression. One that we could still solve before >>>>>> 6.4 is out. One I'll likely will point Linus to, unless a fix comes into >>>>>> sight. >>>>>> >>>>>> When I noticed the reluctant replies to this patch I earlier today asked >>>>>> in the thread with the report what the plan forward was: >>>>>> https://lore.kernel.org/all/CAD%3DFV%3DV-h4EUKHCM9UivsFHRsJPY5sAiwXV3a1hUX9DUMkkxdg@xxxxxxxxxxxxxx/ >>>>>> >>>>>> Dough there replied: >>>>>> >>>>>> ``` >>>>>> Of the two proposals made (the revert vs. the reordering of the dts), >>>>>> the reordering of the dts seems better. It only affects the one buggy >>>>>> board (rather than preventing us to move to async probe for everyone) >>>>>> and it also has a chance of actually fixing something (changing the >>>>>> order that regulators probe in rpmh-regulator might legitimately work >>>>>> around the problem). That being said, just like the revert the dts >>>>>> reordering is still just papering over the problem and is fragile / >>>>>> not guaranteed to work forever. >>>>>> ``` >>>>>> >>>>>> Papering over obviously is not good, but has anyone a better idea to fix >>>>>> this? Or is "not fixing" for some reason an viable option here? >>>>>> >>>>> >>>>> I understand there is a regression, although kernel is not mainline >>>>> (hash df7443a96851 is unknown) and the only solutions were papering the >>>>> problem. Reverting commit is a temporary workaround. Moving nodes in DTS >>>>> is not acceptable because it hides actual problem and only solves this >>>>> one particular observed problem, while actual issue is still there. It >>>>> would be nice to be able to reproduce it on real mainline with normal >>>>> operating system (not AOSP) - with ramdiks/without/whatever. So far no >>>>> one did it, right? >>>> >>>> No, I did not try non-AOSP system yet. I'll try it tomorrow, if that >>>> helps. With mainline hash. >>> >>> Hi, here is the crash report on db845c running vanilla v6.4-rc6 with a >>> debian build https://bugs.linaro.org/attachment.cgi?id=1142 >>> >>> And fwiw here is the db845c crash log with AOSP running vanilla >>> v6.4-rc6 https://bugs.linaro.org/attachment.cgi?id=1141 >>> >>> Regards, >>> Amit Pundir >>> >>> PS: rootfs in this bug report doesn't matter much because I'm loading >>> all the kernel modules from a ramdisk and in the case of a crash the >>> UFS doesn't probe anyway. >> >> I just tried current next with defconfig (I could not find your config, >> neither here, nor in your previous mail thread nor in bugzilla). Also >> with REGULATOR_QCOM_RPMH as module. >> >> I tried also v6.4-rc6 - also defconfig with default and module >> REGULATOR_QCOM_RPMH. >> >> All the cases work on my RB3 - no warnings reported. >> >> If you do not use defconfig, then in all reports please mention the >> differences (the best) or at least attach it. > > Argh.. Sorry about that. Big mistake from my side. I did want to > upload my defconfig but forgot. Defconfig plays a key role because, as > I mentioned in one of my previous email, it is a timing/race bug and > if I do any much changes in my defconfig (i.e. enable ftrace for > example or as little as add printk in qcom_rpmh_regulator code) then I > can't reproduce this bug. So needless to say that I can't reproduce > this bug with default arm64 defconfig. > > Please find my custom (but upstream) defconfig here > https://bugs.linaro.org/attachment.cgi?id=1143 and prebuilt binaries > here https://people.linaro.org/~amit.pundir/db845c-userdebug/rpmh_bug/. > "fastboot flash boot ./boot.img-6.4-rc6 reboot" and/or a few (<5) > reboots should be enough to trigger the crash. > > I have downloaded the initrd from here > https://snapshots.linaro.org/96boards/dragonboard845c/linaro/debian/569/initrd.img-5.15.0-qcomlt-arm64 > but edited ramdisk/init to run "load_module" function early in the > boot and ramdisk/conf/initramfs.conf has "MODULES=list" instead of > "MODULES=most", where all the kernel modules are listed at > /etc/initramfs-tools/modules. So you have interconnect as module - this is not a supported setup. It might work with if all the modules are loaded very early or might not. Pinctrl is another driver which should be built-in. With your defconfig I see regular issue - console and system dies because of lack of interconnects, most likely. I don't see your WARNs - I just see usual hang. See: https://lore.kernel.org/all/20221021032702.1340963-1-krzysztof.kozlowski@xxxxxxxxxx/ If you want them to really be modules, then you need to fix all the dependencies (SOFTDEP?), probe ordering glitches. It's not a problem of DTS. Just because something can be built as module, does not mean it will work. We don't test it, we don't work with them as modules. It's kind of the same as here: https://lore.kernel.org/all/ac328b6a-a8e2-873d-4015-814cb4f5588e@xxxxxxxxxxxxx/ I understand that we might have here regression, if these were working as modules, but I don't think we ever really committed to it. We can as well make it non-module to solve the regression. Best regards, Krzysztof