Re: PROBLEM: [drm:analogix_dp_bridge_atomic_enable [analogix_dp]] *ERROR* Failed to disable psr -110

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all again,

On Sat, 2020-04-25 at 23:11, Milan P. Stanić wrote:
> Hi Enric, Doug and all,
> On Fri, 2020-04-17 at 21:26, Milan P. Stanić wrote:
> > On Tue, 2020-04-14 at 20:22, Milan P. Stanić wrote:
> > > Yesterday I managed to build chromeOS kernel version 4.4.174 and boot
> > > with it without any serious problem.
> > > 
> > > Current uptime is over 21 hour and it works well, i.e. without problem
> > > related to rockchip-dp/analogix driver, even after suspend-to-ram/resume
> > > few times.
> > > 
> > > I will let it few days to work without shutdown (without poweroff or
> > > reboot) to see will it work or will any problem appear.
> > > 
> > > (beside this analogix issue, looks like also emmc works fine with this
> > > kernel, although it doesn't work fine with mainline kernels. but this is
> > > not related).
> > > 
> > > If the machine work for three or more days without problem I will report
> > > to you. Maybe someone experienced in video/gpu drivers programming could
> > > make diffs and make it to work with mainline kernels.
> > 
> > I built chromeOS kernel 4.4.174 and after three days it works fine
> > regarding this problem with analogix bridge.
> > 
> > Would be nice if someone with GPU/DRM programming knowledge would look
> > at differences between this chromeOS kernel and mainline to find what is
> > cause of the problem.
> > 
> > I will try to build mainline kernels going backward by major version
> > (5.4, 5.3, 5.2 and so on) to try to see if one of the previous doesn't
> > have this problem. This will take some time because problem appears
> > randomly, sometimes few minutes straight after boot but sometimes after
> > day or two.
> 
> I've built 5.2.1 kernel and tested it for three days of uptime (without
> shutdown or reboot) and it worked without locking display but have from
> time to time warnings in dmesg:
> -----------------
>     4.765133] rockchip-dp ff970000.edp: no DP phy configured
>  6524.939937] rockchip-dp ff970000.edp: Failed to apply PSR -110
> 14481.854325] rockchip-dp ff970000.edp: Failed to apply PSR -110
> 14565.881017] rockchip-dp ff970000.edp: Failed to apply PSR -110
> 15793.280974] rockchip-dp ff970000.edp: Failed to apply PSR -110
> 22474.968271] rockchip-dp ff970000.edp: Failed to apply PSR -110
> 24054.391454] rockchip-dp ff970000.edp: Failed to apply PSR -110
> 41126.507765] rockchip-dp ff970000.edp: AUX CH cmd reply timeout!
> 43526.604191] rockchip-dp ff970000.edp: Failed to apply PSR -110
> 111807.839641] rockchip-dp ff970000.edp: Failed to apply PSR -110
> 112710.959799] rockchip-dp ff970000.edp: Failed to apply PSR -110
> 113122.383232] rockchip-dp ff970000.edp: Failed to apply PSR -110
> 113205.260384] rockchip-dp ff970000.edp: AUX CH cmd reply timeout!
> 113379.609998] rockchip-dp ff970000.edp: Failed to apply PSR -110
> -------------------
> 
> So it works though with this a little annoying warnings but it is
> stable, no other issues and suspend/resume to ram about 5 to 9 times.
> 
> Then I built 5.3.1 kernel and it's current uptime is five days, and it
> locked display first day but suspend-to-ram and resume unlocked display
> and after that no once I have seen display lock. Also
> suspend-to-ram/resume works fine for five days without poweroff/reboot.
> 
> But still have annoying warnings in dmesg output similar to above (and I
> think it doesn't make sense to paste it again here).
> 
> Tomorrow I will build 5.4.1 kernel and test it for few days in hope that
> I will find at what kernel version problem started to be serious.

I've built kernel 5.4.1 over weekend and tested it for this problem and
it locked display two times in first days, so the problem is probably
introduced between 5.3 and 5.4 kernel version.
I looked diff in kernel tree with:
git log -p v5.3..v5.4 drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
and I see one big change which I don't dare to revert or change because
I don't know anything about gpu/drm device driver programming.

My knowledge stops here.

Please, if someone can fix this and post patches I'm ready to test.

-- 
Kind regards 
> -- 
> Kind regards
> 
> > -- 
> > Regards
> > 
> > > Thank you help
> > > 
> > > On Tue, 2020-04-14 at 18:17, Enric Balletbo Serra wrote:
> > > > Hi Doug and Milan,
> > > > 
> > > > Thanks for providing this information.
> > > > 
> > > > Missatge de Doug Anderson <dianders@xxxxxxxxxxxx> del dia dl., 13
> > > > d’abr. 2020 a les 17:23:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Fri, Apr 10, 2020 at 12:29 PM Milan P. Stanić <mps@xxxxxxxxxxx> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On Fri, 2020-04-10 at 08:28, Doug Anderson wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > On Fri, Apr 10, 2020 at 5:56 AM Enric Balletbo Serra
> > > > > > > <eballetbo@xxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > Hi Milan,
> > > > > > > >
> > > > > > > > Right, this is an annoying issue but also known, unfortunately, I
> > > > > > > > personally didn't have time to look at. but it is in my TODO.
> > > > > > >
> > > > > > > Random shot in the dark, but any chance somehow your PHY clock and
> > > > > > > PCLK for the eDP don't match?  If they don't then (IIRC) you'll get
> > > > > > > random failures to access eDP registers.
> > > > > > >
> > > > > > > Some history in <https://crrev.com/c/433393>.  It looks like the
> > > > > > > changes in that patch are upstream but if something else happened to
> > > > > > > make your PHY and PCLK mismatch it could cause similar symptoms.
> > > > > > >
> > > > > > > ...of course it's always possible (probable) that it's something
> > > > > > > different, but since that was such a weird and hard-to-track-down
> > > > > > > problem I figured I'd at least make sure it wasn't that.
> > > > > >
> > > > > > Not sure I understood (I'm not graphic hardware programmer) but I
> > > > > > changed arch/arm64/boot/dts/rockchip/rk3399.dtsi file around line
> > > > > > 1367 (current mainline kernel), this:
> > > > > >     assigned-clocks =
> > > > > >       <&cru PLL_GPLL>, <&cru PLL_CPLL>,
> > > > > >       <&cru PLL_NPLL>,
> > > > > >       <&cru ACLK_PERIHP>, <&cru HCLK_PERIHP>,
> > > > > >       <&cru PCLK_PERIHP>,
> > > > > >       <&cru ACLK_PERILP0>, <&cru HCLK_PERILP0>,
> > > > > >       <&cru PCLK_PERILP0>, <&cru ACLK_CCI>,
> > > > > >       <&cru HCLK_PERILP1>, <&cru PCLK_PERILP1>,
> > > > > >       <&cru ACLK_VIO>, <&cru ACLK_HDCP>,
> > > > > >       <&cru ACLK_GIC_PRE>,
> > > > > >       <&cru PCLK_DDR>;
> > > > > >     assigned-clock-rates =
> > > > > >        <594000000>,  <800000000>,
> > > > > >       <1000000000>,
> > > > > >        <150000000>,   <75000000>,
> > > > > >         <37500000>,
> > > > > >        <100000000>,  <100000000>,
> > > > > >         <50000000>, <600000000>,
> > > > > >        <100000000>,   <50000000>,
> > > > > >        <400000000>, <400000000>,
> > > > > >        <200000000>,
> > > > > >        <200000000>;
> > > > > >
> > > > > > and changed  <594000000> to  <600000000>
> > > > > > build kernel and it boots but display is blank after boot.
> > > > >
> > > > > I think kevin already overrides those clocks in its dts.  I was more
> > > > > thinking of looking at "/sys/kernel/debug/clk/clk_summary" and seeing
> > > > > if there was a clock mismatch.
> > > > >
> > > > 
> > > > Although I don't discard that this would be the problem, I think it is
> > > > more a racing problem with the tracking status of the crtc active and
> > > > self_refresh_active variables during the suspend path and PSR. I.e, if
> > > > I apply the following patch which sets a delay of 100ms in the delayed
> > > > entry work to entry the PSR state (similar to what we had before the
> > > > commit I mentioned), suspend resume works as expected for me.
> > > > 
> > > > @@ -218,7 +234,7 @@ void drm_self_refresh_helper_alter_state(struct
> > > > drm_atomic_state *state)
> > > >                 mutex_unlock(&sr_data->avg_mutex);
> > > > 
> > > >                 mod_delayed_work(system_wq, &sr_data->entry_work,
> > > > -                                msecs_to_jiffies(delay));
> > > > +                                msecs_to_jiffies(100));
> > > >         }
> > > >  }
> > > > 
> > > > Some more info is that I was not able to reproduce the problem by
> > > > triggering an 'echo mem > /sys/power/state' The only way I can
> > > > reproduce the issue is doing as 'systemctl supend' command, which if I
> > > > am not mistaken does a DPMS off before suspending.
> > > > 
> > > > - Enric
> > > > 
> > > > > -Doug

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-rockchip




[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux