On Tue, Oct 31, 2017 at 3:04 PM, Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote: > On 31 October 2017 at 14:55, Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote: >> Hi Rafael, Tero, >> >> CC pinchartl, dri-devel >> >> On Tue, Oct 31, 2017 at 2:10 PM, Geert Uytterhoeven >> <geert@xxxxxxxxxxxxxx> wrote: >>> CC linux-renesas-soc >>> >>> On Tue, Oct 31, 2017 at 2:09 PM, Geert Uytterhoeven >>> <geert@xxxxxxxxxxxxxx> wrote: >>>> On Tue, Oct 31, 2017 at 12:27 AM, Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote: >>>>> On Monday, October 30, 2017 11:19:08 AM CET Rafael J. Wysocki wrote: >>>>>> On Mon, Oct 30, 2017 at 8:10 AM, Tero Kristo <t-kristo@xxxxxx> wrote: >>>>>> > The recent change to the PM QoS framework to introduce a proper >>>>>> > no constraint value overlooked to handle the devices which don't >>>>>> > implement PM QoS OPS. Runtime PM is one of the more severely >>>>>> > impacted subsystems, failing every attempt to runtime suspend >>>>>> > a device. This leads into some nasty second level issues like >>>>>> > probe failures and increased power consumption among other things. >>>>>> >>>>>> Oh, that's bad. >>>>>> >>>>>> Sorry about breaking it and thanks for the fix! >>>>>> >>>>>> > Fix this by adding a proper return value for devices that don't >>>>>> > implement PM QoS implicitly. >>>>>> > >>>>>> > Fixes: 0cc2b4e5a020 ("PM / QoS: Fix device resume latency PM QoS") >>>>>> > Signed-off-by: Tero Kristo <t-kristo@xxxxxx> >>>>>> > Cc: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> >>>>>> >>>>>> Applied. >>>>> >>>>> And pushed to Linus. >>>> >>>> I'm afraid it is not sufficient. >>>> >>>> Commit 0cc2b4e5a020fc7f ("PM / QoS: Fix device resume latency PM QoS") >>>> introduced two issues on Renesas platforms: >>>> 1. After boot up, many devices have changed their state from "suspended" >>>> to "active", according to /sys/kernel/debug/pm_genpd/pm_genpd_summary >>>> (comparing that file across boots is one of my standard tests). >>>> Interestingly, doing a system suspend/resume cycle restores their state >>>> to "suspended". >>>> >>>> 2. During system suspend, the following warning is printed on >>>> r8a7791/koelsch: >>>> >>>> i2c-rcar e6530000.i2c: runtime PM trying to suspend device but >>>> active child >> >> 3. I've just bisected a seemingly unrelated issue to the same commit. >> On Salvator-XS with R-Car H3, initialization of the rcar-du driver now >> takes more than 1 minute due to flip_done time outs, while it took 0.12s >> before: >> >> [ 3.015035] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). >> [ 3.021721] [drm] No driver support for vblank timestamp query. >> [ 13.280738] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* >> [CRTC:58:crtc-3] flip_done timed out >> [ 23.520707] [drm:drm_atomic_helper_commit_cleanup_done] *ERROR* >> [CRTC:58:crtc-3] flip_done timed out >> [ 33.760708] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* >> [CRTC:58:crtc-3] flip_done timed out >> [ 44.000755] [drm:drm_atomic_helper_commit_cleanup_done] *ERROR* >> [CRTC:58:crtc-3] flip_done timed out >> [ 44.003597] Console: switching to colour frame buffer device 128x48 >> [ 54.240707] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* >> [CRTC:58:crtc-3] flip_done timed out >> [ 64.480706] [drm:drm_atomic_helper_commit_cleanup_done] *ERROR* >> [CRTC:58:crtc-3] flip_done timed out >> [ 64.544876] rcar-du feb00000.display: fb0: frame buffer device >> [ 64.552013] [drm] Initialized rcar-du 1.0.0 20130110 for >> feb00000.display on minor 0 >> [ 64.559873] [drm] Device feb00000.display probed >> >>>> Commit 2a9a86d5c81389cd ("PM / QoS: Fix default runtime_pm device resume >>>> latency") fixes the second issue, but not the first. >> >> ... nor the third. >> >>>> Reverting commits 2a9a86d5c81389cd ("PM / QoS: Fix default runtime_pm >>>> device resume latency") and 0cc2b4e5a020fc7f ("PM / QoS: Fix device resume >>>> latency PM QoS") fixes both. >> >> ... all three. >> >>>> Do you have a clue? >>>> Thanks! > > As I didn't have the time to review the original commit, before it got > pushed as a fix, I am planning to review it now instead. > > A vague guess is that the genpd governor prevents the device from > being suspended. That was also the most tricky part of the changes > from the original commit, which is causing problems. I think you are right. > I get back to this when I have reviewed it more thoroughly. Thanks, and sorry for breaking stuff. In retrospect I should just have not pushed it so late in the cycle. Rafael