Re: [PATCH] drm/i915/display: Increase timeout for DP Aux channel ctl signal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 15, 2019 at 02:43:46PM -0700, Rodrigo Vivi wrote:
On Fri, Mar 15, 2019 at 02:39:25PM -0700, Rodrigo Vivi wrote:
On Fri, Mar 15, 2019 at 02:31:40PM -0700, Rodrigo Vivi wrote:
> On Fri, Mar 15, 2019 at 01:27:22PM -0700, Vanshidhar Konda wrote:
> > On Fri, Mar 15, 2019 at 12:38:41PM -0700, Rodrigo Vivi wrote:
> > > On Fri, Mar 15, 2019 at 11:39:54AM -0700, Vanshidhar Konda wrote:
> > > > Extend the timeout for the hardware to signal SEND_BUSY on the DP
> > > > Aux Channel Controller register.
> > > >
> > > > This is needed to address FDO #109982
> > > > https://bugzilla.freedesktop.org/show_bug.cgi?id=109982
> > >
> > > instead of mentioning like this, please use:
> > > Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=109982
> > >
> > > Also instead of the "needed to address" it would be better
> > > to add some reasoning explaining that
> > > "empirically we got some bugs workarounded by increasing the timeout
> > > from 10ms to 15ms although spec was only requiring 4ms"
> > > or something like that...
> > >
> > Thanks! I'll keep these in mind for next patches.
> > > >
> > > > Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> > > > Cc: Imre Deak <imre.deak@xxxxxxxxx>
> > > > Signed-off-by: Vanshidhar Konda <vanshidhar.r.konda@xxxxxxxxx>
> > > > ---
> > > > drivers/gpu/drm/i915/intel_dp.c | 4 +++-
> > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
> > > > index 47857f96c3b1..fd6de33c5664 100644
> > > > --- a/drivers/gpu/drm/i915/intel_dp.c
> > > > +++ b/drivers/gpu/drm/i915/intel_dp.c
> > > > @@ -1053,6 +1053,8 @@ intel_dp_check_edp(struct intel_dp *intel_dp)
> > > > 	}
> > > > }
> > > >
> > > > +#define DP_AUX_CH_CTL_SIGNAL_TIMEOUT_MS 15
> > >
> > > This define is spurious since it's in use in a single place.
> > >
> > > Also, giving the timeout a name, like this, makes it appear it came from
> > > the spec. Well, if it came from Spec it should be defined in the proper .h
> > > files.
> > >
> > > Since I don't believe this came from spec I believe we can just remove it
> > > and go for the timeout directly on the function below.
> > >
> > I'll make this change in the next patch.
> > > > +
> > > > static u32
> > > > intel_dp_aux_wait_done(struct intel_dp *intel_dp)
> > > > {
> > > > @@ -1063,7 +1065,7 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp)
> > > >
> > > > #define C (((status = I915_READ_NOTRACE(ch_ctl)) & DP_AUX_CH_CTL_SEND_BUSY) == 0)
> > > > 	done = wait_event_timeout(dev_priv->gmbus_wait_queue, C,
> > > > -				  msecs_to_jiffies_timeout(10));
> > > > +				  msecs_to_jiffies_timeout(DP_AUX_CH_CTL_SIGNAL_TIMEOUT_MS));
> > >
> > > Is this just a guess that you are trying to check?
> > > I'm asking because I didn't see any indication that the increase
> > > really fixed the issue.
> > >
> > > So, if you are trying to just validate your approach maybe the try-bot
> > > could be used?
> > >
> > I also would have preferred to use the trybot. The error in Bugzilla 109982 is only reproduced on only shard-iclb2 machine once
> > in approximately 2 days. So the trybot is not of much help for this.
> >
> > Is there another way of testing experimental patches for bugs that are
> > not reproduced locally or on other machines? Or, if the issue is only
> > happening on one machine, should it be lowered in priority/whitelisted
> > on the specific CI machine?
>
> For now I believe the best alternative is to --subject-prefix="[CI]"
>
> But also you need to be sure that the fail rate don't fool us...

hmm... this testcase seems to have a passrate of 77% of the runs.
So probably using the CI infrastructure here isn't the best alternative
to validate this approach in general.

ops... I forgot to paste the link from these data:
https://intel-gfx-ci.01.org/tree/drm-tip/shard-iclb.html
(30 runs out of 45 passing)

Sorry, I don't think I understood what you are saying here. The CI/IGT
test that is reported in the bug itself doesn't have an issue. It pases
successfully.

The issue reported by Martin is that there is an error printed out to dmesg.
<3> [1128.098368] [drm:intel_dp_aux_xfer [i915]] *ERROR* dp aux hw did
not signal timeout!

I noticed that there is a similar failure on one other KBL system
(fi-kbl-7500u) where this issue occurs regularly. But on that platform
even after 5 try programming the DP Aux Channel Controller fails.

So, this might be a one off hardware platform issue?


>
> I believe there will be better alternatives for using CI for cases like
> this soon, so +Martin
>
> >
> > > Thanks,
> > > Rodrigo.
> > >
> > > >
> > > > 	/* just trace the final value */
> > > > 	trace_i915_reg_rw(false, ch_ctl, status, sizeof(status), true);
> > > > --
> > > > 2.20.1
> > > >
> > > > _______________________________________________
> > > > Intel-gfx mailing list
> > > > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux