Re: [PATCH v4 i-g-t] tests/kms: increase max threshold time for edid read

Daniel Vetter <daniel@xxxxxxxx> · Mon, 14 Aug 2017 16:36:22 +0200



On Mon, Aug 14, 2017 at 11:43:34AM +0300, Jani Nikula wrote:
> On Fri, 11 Aug 2017, Clint Taylor <clinton.a.taylor@xxxxxxxxx> wrote:
> > On 08/11/2017 12:49 AM, Lofstedt, Marta wrote:
> >> We can't have this lspcon as an argument to the test, it will not work with automated testing using piglit as we do for CI.
> >> Theoretically we could add a "has_lspcon" to debugfs, but I believe that this test has started to be over complicated.
> > The test would need to know if an LSPcon is connected on a port by port 
> > basis. This is easy if the LSPcon driver is loaded but in the case of 
> > USB_C->HDMI is gets a little more complicated (not impossible) to figure 
> > out. Even if we know exactly what device is connected failures can still 
> > occur if the TCON/Monitor clock stretches the EDID read.
> >
> >> The intention of the test is to do a sanity check so that we don't drift off here, so I actually prefer the V1.
> > Unfortunately with the timing differences (3ms to 96ms) based on the 
> > monitor type connected and EDID size there is no way for a one size fits 
> > all sanity check to be reliable. If the original point of this test was 
> > to figure out if probe caused more than 1 EDID read, maybe we should 
> > actually count EDID reads and not infer it by measuring time.
> 
> I'll take a step back here and try to look at the big picture.
> 
> I think the point of the test originally was to catch regressions in the
> EDID code that would slow down EDID reading. That's a valid thing to
> test per se, but unfortunately it's not easy to do that accurately. The
> basic alternatives are, from most accurate to least accurate:
> 
> 1) Start off with reference timing, and make relative comparisons
> against that. This is not unlike performance testing. The problem is, to
> not compare apples and oranges, you'll need to take into account
> platform, connector type, adapters, cables, hubs, EDID size, and the
> display. Change one, and you can't make the comparison anymore. You end
> up tracking configurations and timings, a lot.
> 
> 2) Try to make a reasonable estimate of the absolute maximum based on
> some subset of the configuration (such as connector type and
> adapters). If you stay below, regardless of the timing changes, you
> assume it's fine, and otherwise you consider it a regression. So you try
> to keep the limits tight to catch regressions, but not flag normal
> behaviour too much. You end up accumulating heuristics for the
> configuration and timing, because, let's face it, there is a lot of
> variance in what's acceptable. (This is v2+ of the patch at hand.)
> 
> 3) Try to make a reasonable estimate of the absolute maximum independent
> of the configuration. You'll never catch regressions in the fast
> configurations, because your limits are based on the worst case. You'll
> only catch the really pathological problems. Note that the current code
> uses mean, so it evens out connector type specific problems; we might
> also not catch pathological problems in, say, DP if the HDMI is
> fast. (This is the original and v1 of the patch.)
> 
> I think 1) is more trouble than it's worth. 3) is simplistic but
> easy. The question is, do we get enough benefit from 2) to offset the
> complications?

Why is 2) hard? EDID read takes 22ms at wire-speed, per block. Add a bit
of fudge, gives us 10ms + 25ms*num_edid_blocks. Imo everything else is a
bug in our kernel driver. I have no idea why lspcon matters or not,
because it really shouldn't.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx