Hi Rafael, On Thu, 13 Mar 2025 at 18:35, Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > On Thu, Mar 13, 2025 at 6:27 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > > On Thu, Mar 13, 2025 at 5:48 PM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote: > > > On Thu, 13 Mar 2025 at 15:32, Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote: > > > > On Thu, 13 Feb 2025 at 11:26, Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote: > > > > > On Thu, 13 Feb 2025 at 09:31, Saravana Kannan <saravanak@xxxxxxxxxx> wrote: > > > > > > On Mon, Feb 10, 2025 at 2:24 AM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote: > > > > > > > On Fri, 7 Feb 2025 at 16:08, Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote: > > > > > > > > Instrumenting all dev->power.completion accesses in > > > > > > > > drivers/base/power/main.c reveals that resume is blocked in dpm_wait() > > > > > > > > in the call to wait_for_completion() for regulator-1p2v, which is > > > > > > > > indeed a dependency for the SN65DSI86 DSI-DP bridge. Comparing > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > Looking at /sys/devices/virtual/devlink, the non-working case has the > > > > > > > > following extra entries: > > > > > > > > > > > > > > Note that the SN65DSI86 DSI-DP bridge driver uses the auxiliary bus > > > > > > > to create four subdevices: > > > > > > > - ti_sn65dsi86.aux.0, > > > > > > > - ti_sn65dsi86.bridge.0, > > > > > > > - ti_sn65dsi86.gpio.0, > > > > > > > - ti_sn65dsi86.pwm.0. > > > > > > > None of them have supplier:* symlinks in sysfs, so perhaps that is > > > > > > > the root cause of the issue? > > > > > > > > > > > > Sorry, I haven't had time to look into this closely. Couple of > > > > > > questions/suggestions that might give you some answers. > > > > > > > > > > > > Is this an issue only happening for s2idle or for s2ram too? I'd guess > > > > > > both, but if not, that might tell you something? > > > > > > > > > > The two (very similar) boards I could reproduce the issue on do not > > > > > support s2ram yet. > > > > > > > > > > > The only reason the wait_for_completion() wouldn't work is because the > > > > > > supplier is not "completing"? > > > > > > > > > > Yes, the diff shows ca. 70 additional calls to "complete_all()" in the > > > > > good case. > > > > > > > 4. When the issue happens, /sys/devices/virtual/devlink shows 3 > > > > more links: > > > > A. platform:feb00000.display is a supplier of platform:fed80000.dsi-encoder > > > > B. platform:fed80000.dsi-encoder is a supplier of platform:feb00000.display > > > > > > Their status file report "dormant". > > > > > > > C. i2c:1-002c is a supplier of platform:fed80000.dsi-encoder > > > > > > Its status file reports "available". > > > > > > > 5. What happens in dpm_noirq_resume_devices()? > > > > > > > > /* > > > > * Trigger the resume of "async" devices upfront so they don't have to > > > > * wait for the "non-async" ones they don't depend on. > > > > */ > > > > i2c-1 (i2c bus) and 1-002c (bridge device) are async, > > > > thus triggered first. > > > > After that, the remaining devices are resumed. > > > > > > > > In the bad case: > > > > > > > > device_resume_noirq(fed80000.dsi-encoder, async=false) > > > > dpm_wait_for_superior() > > > > parent soc: skipping wait_for_completion() > > > > dpm_wait_for_suppliers() > > > > supplier feb00000.display: skipped, DL_STATE_DORMANT > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > Cfr. extra link A above (harmless) > > > > > > > > supplier e6150000.clock-controller: skipping wait_for_completion() > > > > supplier 1-002c: wait_for_completion() => BLOCKED > > > > ^^^^^^^^^^^^^^^ > > > > Cfr. extra link C above, but the bridge device hasn't been resumed yet. > > > > > > Changing the test for "DL_STATE_DORMANT"[1] in dpm_wait_for_suppliers() > > > to also include "DL_STATE_AVAILABLE" makes it skip supplier 1-002c, > > > and fixes the issue. Does that make sense? > > > > Good question. > > > > DL_STATE_AVAILABLE essentially means that the consumer hasn't been > > probed yet, but it doesn't mean that it can be suspended before its > > supplier. > > I really meant "resumed before its supplier", sorry for the confusion. > > Generally speaking, suppliers need to be resumed first regardless of > the status of the consumers. Exactly, and that's being violated here. Before resume, dpm_noirq_list contains: [...] - fed80000.dsi-encoder [...] - regulator-1p2v [...] - 1-002c which is the order in which devices are to be resumed. regulator-1p2v is a supplier of 1-002c: OK. 1-002c is a supplier of fed80000.dsi-encoder: NG. As devices are resumed in the inverse order they have been suspended before, suspend order is also wrong. Hence checking also for DL_STATE_AVAILABLE would just fix the symptom, and not the cause, right? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds