Re: s2idle blocked on dev->power.completion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Saravana,

On Thu, 13 Feb 2025 at 11:26, Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> On Thu, 13 Feb 2025 at 09:31, Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > On Mon, Feb 10, 2025 at 2:24 AM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> > > On Fri, 7 Feb 2025 at 16:08, Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> > > > Instrumenting all dev->power.completion accesses in
> > > > drivers/base/power/main.c reveals that resume is blocked in dpm_wait()
> > > > in the call to wait_for_completion() for regulator-1p2v, which is
> > > > indeed a dependency for the SN65DSI86 DSI-DP bridge.  Comparing
> > >
> > > [...]
> > >
> > > > Looking at /sys/devices/virtual/devlink, the non-working case has the
> > > > following extra entries:
> > >
> > > Note that the SN65DSI86 DSI-DP bridge driver uses the auxiliary bus
> > > to create four subdevices:
> > >   - ti_sn65dsi86.aux.0,
> > >   - ti_sn65dsi86.bridge.0,
> > >   - ti_sn65dsi86.gpio.0,
> > >   - ti_sn65dsi86.pwm.0.
> > > None of them have supplier:* symlinks in sysfs, so perhaps that is
> > > the root cause of the issue?
> >
> > Sorry, I haven't had time to look into this closely. Couple of
> > questions/suggestions that might give you some answers.
> >
> > Is this an issue only happening for s2idle or for s2ram too? I'd guess
> > both, but if not, that might tell you something?
>
> The two (very similar) boards I could reproduce the issue on do not
> support s2ram yet.
>
> > The only reason the wait_for_completion() wouldn't work is because the
> > supplier is not "completing"?
>
> Yes, the diff shows ca. 70 additional calls to "complete_all()" in the
> good case.
>
> > There's some weird direct_complete logic
> > that I haven't fully understood. You can look at that to see if some
> > of the devices are skipping their resumes and hence the "completes"
> > too? Also, runtime PM and some flag can cause some lazy resume or
> > avoid suspending already suspended devices behavior. Check that too.
>
> Thanks, will give it a try...

More findings:

1. The issue does not happen with "fw_devlink=off".
   It does happen with all of "fw_devlink=(permissive|on|rpm)".

2. Looking at differences in direct_complete state didn't help.

3. When the issue happens, two more dependency cycle fixes are printed:

       /soc/dsi-encoder@fed80000: Fixed dependency cycle(s) with
/soc/i2c@e6508000/bridge@2c
       /soc/i2c@e6508000/bridge@2c: Fixed dependency cycle(s) with
/soc/dsi-encoder@fed80000

     These are not new: the first one is printed 4 instead of 3 times,
     the second one is printed 3 instead of 2 times.

  4. When the issue happens, /sys/devices/virtual/devlink shows 3
     more links:
       A. platform:feb00000.display is a supplier of
platform:fed80000.dsi-encoder
       B. platform:fed80000.dsi-encoder is a supplier of
platform:feb00000.display
       C. i2c:1-002c is a supplier of platform:fed80000.dsi-encoder

     A and B are due to endpoint links between ports of the display
     and dsi-encoder nodes.
     C is due to the endpoint links between ports1 of the bridge and
     dsi-encoder nodes. However, I'd expect platform:fed80000.dsi-encoder
     being a supplier of i2c:1-002c, too?

     Note that feb00000.display is one of the devices that
     were probe deferred, due no driver for fed80000.dsi-encoder
     being available.
     The other device that was probe-deferred is
     ti_sn65dsi86.bridge.1068, which is an auxiliary-bus subdevice of
     i2c:1-002c.

  5. What happens in dpm_noirq_resume_devices()?

       /*
        * Trigger the resume of "async" devices upfront so they don't have to
        * wait for the "non-async" ones they don't depend on.
        */
        i2c-1 (i2c bus) and 1-002c (bridge device) are async,
        thus triggered first.
        After that, the remaining devices are resumed.

     In the bad case:

       device_resume_noirq(fed80000.dsi-encoder, async=false)
         dpm_wait_for_superior()
           parent soc: skipping wait_for_completion()
           dpm_wait_for_suppliers()
             supplier feb00000.display: skipped, DL_STATE_DORMANT
             ^^^^^^^^^^^^^^^^^^^^^^^^^
Cfr. extra link A above (harmless)

             supplier e6150000.clock-controller: skipping wait_for_completion()
             supplier 1-002c: wait_for_completion() => BLOCKED
             ^^^^^^^^^^^^^^^
Cfr. extra link C above, but the bridge device hasn't been resumed yet.

Then it continues resuming async devices:

       device_resume_noirq(i2c-1, async=true)
         dpm_wait_for_superior()
           parent e6508000.i2c: wait_for_completion(), completed
           dpm_wait_for_suppliers()
             (none)
         complete_all()

       device_resume_noirq(1-002c, async=true)
         dpm_wait_for_superior()
           parent i2c-1: wait_for_completion(), completed
           dpm_wait_for_suppliers
             supplier e6050000.pinctrl: wait_for_completion(), completed
             supplier regulator-1p2v: wait_for_completion() => BLOCKED
             ^^^^^^^^^^^^^^^^^^^^^^^
The regulator hasn't been resumed yet.

     In the good case:

       device_resume_noirq(fed80000.dsi-encoder, async=false)
         dpm_wait_for_superior()
           parent soc: skipping wait_for_completion()
           dpm_wait_for_suppliers()
             supplier e6150000.clock-controller: skipping wait_for_completion()
         complete_all()
         ^^^^^^^^^^^^
As feb00000.display and 1-002c are not suppliers, fed80000.dsi-encoder
does not have to wait for them.

       [...]

       device_resume_noirq(regulator-1p2v, async=false)
                           ^^^^^^^^^^^^^^^
After a while, the regulator is resumed...

         dpm_wait_for_superior()
           parent platform: wait_for_completion()
           dpm_wait_for_suppliers()
             (none)
         complete_all()

       [...]

       device_resume_noirq(regulator.1, async=false)
                           ^^^^^^^^^^^^^^^
followed by the virtual counterpart.
         dpm_wait_for_superior()
           parent regulator-1p2v: skipping wait_for_completion()
           dpm_wait_for_suppliers ()
             (none)
         complete_all()

       [...]

       device_resume_noirq(1-002c, async=true)
                           ^^^^^^
The bridge is resumed much later...

         dpm_wait_for_superior()
           parent i2c-1: wait_for_completion(), completed
           dpm_wait_for_suppliers
             supplier e6050000.pinctrl: wait_for_completion(), completed
             supplier regulator-1p2v: wait_for_completion(), completed
             ^^^^^^^^^^^^^^^^^^^^^^^
             supplier regulator-1p8v: wait_for_completion(), completed
             supplier e6050980.gpio: wait_for_completion(), completed
             supplier e61c0000.interrupt-controller:
wait_for_completion(), completed
             supplier regulator.1: wait_for_completion(), completed
             ^^^^^^^^^^^^^^^^^^^^
... after the regulators were resumed
             supplier regulator.2: wait_for_completion(), completed
         complete_all()

So the issue seems to be the creation of link C
(i2c:1-002c is a supplier of platform:fed80000.dsi-encoder).
Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds





[Index of Archives]     [Linux Samsung SOC]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux