Hi Saravana, On Thu, 13 Feb 2025 at 11:26, Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote: > On Thu, 13 Feb 2025 at 09:31, Saravana Kannan <saravanak@xxxxxxxxxx> wrote: > > On Mon, Feb 10, 2025 at 2:24 AM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote: > > > On Fri, 7 Feb 2025 at 16:08, Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote: > > > > Instrumenting all dev->power.completion accesses in > > > > drivers/base/power/main.c reveals that resume is blocked in dpm_wait() > > > > in the call to wait_for_completion() for regulator-1p2v, which is > > > > indeed a dependency for the SN65DSI86 DSI-DP bridge. Comparing > > > > > > [...] > > > > > > > Looking at /sys/devices/virtual/devlink, the non-working case has the > > > > following extra entries: > > > > > > Note that the SN65DSI86 DSI-DP bridge driver uses the auxiliary bus > > > to create four subdevices: > > > - ti_sn65dsi86.aux.0, > > > - ti_sn65dsi86.bridge.0, > > > - ti_sn65dsi86.gpio.0, > > > - ti_sn65dsi86.pwm.0. > > > None of them have supplier:* symlinks in sysfs, so perhaps that is > > > the root cause of the issue? > > > > Sorry, I haven't had time to look into this closely. Couple of > > questions/suggestions that might give you some answers. > > > > Is this an issue only happening for s2idle or for s2ram too? I'd guess > > both, but if not, that might tell you something? > > The two (very similar) boards I could reproduce the issue on do not > support s2ram yet. > > > The only reason the wait_for_completion() wouldn't work is because the > > supplier is not "completing"? > > Yes, the diff shows ca. 70 additional calls to "complete_all()" in the > good case. > > > There's some weird direct_complete logic > > that I haven't fully understood. You can look at that to see if some > > of the devices are skipping their resumes and hence the "completes" > > too? Also, runtime PM and some flag can cause some lazy resume or > > avoid suspending already suspended devices behavior. Check that too. > > Thanks, will give it a try... More findings: 1. The issue does not happen with "fw_devlink=off". It does happen with all of "fw_devlink=(permissive|on|rpm)". 2. Looking at differences in direct_complete state didn't help. 3. When the issue happens, two more dependency cycle fixes are printed: /soc/dsi-encoder@fed80000: Fixed dependency cycle(s) with /soc/i2c@e6508000/bridge@2c /soc/i2c@e6508000/bridge@2c: Fixed dependency cycle(s) with /soc/dsi-encoder@fed80000 These are not new: the first one is printed 4 instead of 3 times, the second one is printed 3 instead of 2 times. 4. When the issue happens, /sys/devices/virtual/devlink shows 3 more links: A. platform:feb00000.display is a supplier of platform:fed80000.dsi-encoder B. platform:fed80000.dsi-encoder is a supplier of platform:feb00000.display C. i2c:1-002c is a supplier of platform:fed80000.dsi-encoder A and B are due to endpoint links between ports of the display and dsi-encoder nodes. C is due to the endpoint links between ports1 of the bridge and dsi-encoder nodes. However, I'd expect platform:fed80000.dsi-encoder being a supplier of i2c:1-002c, too? Note that feb00000.display is one of the devices that were probe deferred, due no driver for fed80000.dsi-encoder being available. The other device that was probe-deferred is ti_sn65dsi86.bridge.1068, which is an auxiliary-bus subdevice of i2c:1-002c. 5. What happens in dpm_noirq_resume_devices()? /* * Trigger the resume of "async" devices upfront so they don't have to * wait for the "non-async" ones they don't depend on. */ i2c-1 (i2c bus) and 1-002c (bridge device) are async, thus triggered first. After that, the remaining devices are resumed. In the bad case: device_resume_noirq(fed80000.dsi-encoder, async=false) dpm_wait_for_superior() parent soc: skipping wait_for_completion() dpm_wait_for_suppliers() supplier feb00000.display: skipped, DL_STATE_DORMANT ^^^^^^^^^^^^^^^^^^^^^^^^^ Cfr. extra link A above (harmless) supplier e6150000.clock-controller: skipping wait_for_completion() supplier 1-002c: wait_for_completion() => BLOCKED ^^^^^^^^^^^^^^^ Cfr. extra link C above, but the bridge device hasn't been resumed yet. Then it continues resuming async devices: device_resume_noirq(i2c-1, async=true) dpm_wait_for_superior() parent e6508000.i2c: wait_for_completion(), completed dpm_wait_for_suppliers() (none) complete_all() device_resume_noirq(1-002c, async=true) dpm_wait_for_superior() parent i2c-1: wait_for_completion(), completed dpm_wait_for_suppliers supplier e6050000.pinctrl: wait_for_completion(), completed supplier regulator-1p2v: wait_for_completion() => BLOCKED ^^^^^^^^^^^^^^^^^^^^^^^ The regulator hasn't been resumed yet. In the good case: device_resume_noirq(fed80000.dsi-encoder, async=false) dpm_wait_for_superior() parent soc: skipping wait_for_completion() dpm_wait_for_suppliers() supplier e6150000.clock-controller: skipping wait_for_completion() complete_all() ^^^^^^^^^^^^ As feb00000.display and 1-002c are not suppliers, fed80000.dsi-encoder does not have to wait for them. [...] device_resume_noirq(regulator-1p2v, async=false) ^^^^^^^^^^^^^^^ After a while, the regulator is resumed... dpm_wait_for_superior() parent platform: wait_for_completion() dpm_wait_for_suppliers() (none) complete_all() [...] device_resume_noirq(regulator.1, async=false) ^^^^^^^^^^^^^^^ followed by the virtual counterpart. dpm_wait_for_superior() parent regulator-1p2v: skipping wait_for_completion() dpm_wait_for_suppliers () (none) complete_all() [...] device_resume_noirq(1-002c, async=true) ^^^^^^ The bridge is resumed much later... dpm_wait_for_superior() parent i2c-1: wait_for_completion(), completed dpm_wait_for_suppliers supplier e6050000.pinctrl: wait_for_completion(), completed supplier regulator-1p2v: wait_for_completion(), completed ^^^^^^^^^^^^^^^^^^^^^^^ supplier regulator-1p8v: wait_for_completion(), completed supplier e6050980.gpio: wait_for_completion(), completed supplier e61c0000.interrupt-controller: wait_for_completion(), completed supplier regulator.1: wait_for_completion(), completed ^^^^^^^^^^^^^^^^^^^^ ... after the regulators were resumed supplier regulator.2: wait_for_completion(), completed complete_all() So the issue seems to be the creation of link C (i2c:1-002c is a supplier of platform:fed80000.dsi-encoder). Thanks! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds