Re: [PATCH i-g-t v2] tests/kms_frontbuffer_tracking: increase FBC wait timeout to 5s

Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx> · Mon, 04 Sep 2017 15:26:01 -0300

Em Seg, 2017-09-04 às 11:45 +0100, Chris Wilson escreveu:
> Quoting Paulo Zanoni (2017-09-01 20:12:01)
> > Em Sex, 2017-08-25 às 14:11 +0100, Chris Wilson escreveu:
> > > Quoting Lofstedt, Marta (2017-08-25 13:50:16)
> > > > 
> > > > 
> > > > > -----Original Message-----
> > > > > From: Lofstedt, Marta
> > > > > Sent: Friday, August 25, 2017 2:54 PM
> > > > > To: 'Chris Wilson' <chris@xxxxxxxxxxxxxxxxxx>; intel-gfx@list
> > > > > s.fr
> > > > > eedesktop.org
> > > > > Subject: RE:  [PATCH i-g-t v2]
> > > > > tests/kms_frontbuffer_tracking:
> > > > > increase FBC wait timeout to 5s
> > > > > 
> > > > > 
> > > > > 
> > > > > > -----Original Message-----
> > > > > > From: Chris Wilson [mailto:chris@xxxxxxxxxxxxxxxxxx]
> > > > > > Sent: Friday, August 25, 2017 1:47 PM
> > > > > > To: Lofstedt, Marta <marta.lofstedt@xxxxxxxxx>; intel-
> > > > > > gfx@xxxxxxxxxxxxxxxxxxxxx
> > > > > > Subject: Re:  [PATCH i-g-t v2]
> > > > > > tests/kms_frontbuffer_tracking:
> > > > > > increase FBC wait timeout to 5s
> > > > > > 
> > > > > > Quoting Marta Lofstedt (2017-08-25 11:40:29)
> > > > > > > From: "Lofstedt, Marta" <marta.lofstedt@xxxxxxxxx>
> > > > > > > 
> > > > > > > The subtests: igt@kms_frontbuffer_tracking@fbc-*draw*
> > > > > > > has non-consistent results, pending between fail and
> > > > > > > pass.
> > > > > > > The fails are always due to "FBC disabled".
> > > > > > > With this increase in timeout the flip-flop behavior is
> > > > > > > no
> > > > > > > longer
> > > > > > > reproducible.
> > > > > > > 
> > > > > > > This is a partial revert of:
> > > > > > > 64590c7b768dc8d8dd962f812d5ff5a39e7e8b54,
> > > > > > > where the timeout was decreased from 5s to 2s.
> > > > > > > After investigating the timeout needed, the conclusion is
> > > > > > > that the
> > > > > > > longer timeout is only needed when the test swaps between
> > > > > > > some
> > > > > > > specific draw domains, typically blt vs. mmap_cpu.
> > > > > > > The objective of the FBC part of the tests is not to
> > > > > > > benchmark draw
> > > > > > > domain changes, it is to check that FBC was (re-)enabled.
> > > > > > > 
> > > > > > > V2: Added documentation
> > > > > > > 
> > > > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=10
> > > > > > > 1623
> > > > > > > Signed-off-by: Marta Lofstedt <marta.lofstedt@xxxxxxxxx>
> > > > > > > Acked-by: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx>
> > > > > > > ---
> > > > > > >  tests/kms_frontbuffer_tracking.c | 2 +-
> > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/tests/kms_frontbuffer_tracking.c
> > > > > > > b/tests/kms_frontbuffer_tracking.c
> > > > > > > index e03524f1..2538450c 100644
> > > > > > > --- a/tests/kms_frontbuffer_tracking.c
> > > > > > > +++ b/tests/kms_frontbuffer_tracking.c
> > > > > > > @@ -924,7 +924,7 @@ static bool
> > > > > > > fbc_stride_not_supported(void)
> > > > > > > 
> > > > > > >  static bool fbc_wait_until_enabled(void)  {
> > > > > > 
> > > > > > Try igt_drop_caches_set(device, DROP_RETIRE); instead of
> > > > > > relaxing the
> > > > > > timeout.
> > > > > > -Chris
> > > > > 
> > > > > OK, I will test that and do a V3 if it works!
> > > > > /Marta
> > > > 
> > > > I did some initial testing with igt_drop_caches_set inside
> > > > fbc_wait_until_enabled and it looks good, I will add this to my
> > > > weekend tests to get more results. This also appear to improve
> > > > the
> > > > runtime of the tests quite a bit. So, maybe the
> > > > igt_drop_caches_set
> > > > should be placed somewhere else so it will give runtime
> > > > improvements not only for the FBC related sub-tests.
> > > 
> > > Sure, all the waits can do with the retire first, give it a
> > > common
> > > function and a comment for the rationale (which should pretty
> > > much
> > > the
> > > same as given in the changelog). 
> > 
> > We can do that, sure, especially if it makes the tests faster...
> > 
> > > Anytime we use the GPU to invalidate
> > > the frontbuffer tracking, we have to wait for a retire to do the
> > > flush.
> > > Retirement is lazy, and is normally driven by GPU activity but we
> > > have a
> > > background kworker to make sure we notice when the system becomes
> > > idle
> > > independent of userspace - except it's low frequency.
> > 
> > ... but our current 2s timeout should have been enough for that,
> > shouldn't it? If I'm looking at the right part of the code,
> > retirement
> > should be once per second, so 2s should have been enough. But it
> > looks
> > like it's not enough
> > 
> > Unless I'm misinterpreting the round_up part, which could convert
> > the
> > 1s to 2s, which would still probably be fine...
> 
> It can bump the wait by upto a second (it tries to align wakeups on
> second boundaries). And we may skip the work if the device is busy
> elsewhere.

Okay, so you're saying that there's no amount of seconds we can wait
that will guarantee the retire handler will run, even in IGT's limited
environment where the only DRM client running is
kms_frontbuffer_tracking? If the answer is yes, then we definitely need
to patch kms_frontbuffer_tracking and do something about it. My
assumption was that 2s (or 5s here) would be enough.

Of course, since this is CI we need a 100% guarantee, 99.99999% is
unacceptable.

> 
> > Anyway, 3s looks like as definitely safe even in this case. Maybe
> > we
> > could go with 3s?
> > 
> > We can both increase the timeout *and* do cache dropping. Although
> > I
> > think not doing the cache dropping is definitely something that
> > needs
> > to be tested, so doing the cache dropping every time may not be a
> > good
> > idea.
> 
> You are not dropping the caches, it is just doing a retire.
> 
> The real question is what is the expectation? If we want the test to
> simply state that when ready FBC et al will be re-enabled, then just
> add
> a synchronous debugfs that establishes the condition in the driver
> that
> FBC should be ready (atm that is DROP_RETIRE, but you will probably
> want
> a better specified knob). 

As much as that's a valid option, I'd prefer to do something that
didn't require adding more complex non-standard interactions between
kms_frontbuffer_tracking and the Kernel.

> If the test is to make sure that FBC is
> reenabled automatically, 

We definitely want to check that. A bug in how we receive/treat the
frontbuffer invalidate/flush calls can lead FBC to never get enabled
again.

> then we need to think some more. In a normal
> workload, this should be the case (since the retire worker you rely
> on
> is for hostile userspace). If you simply look at the hostile
> userspace
> (and you already are for the frontbuffer writes), then a longer
> timeout
> is definitely acceptable, but how long? What is that limit?
> 
> If you define an upper bound for how long you allow fbc et al to
> remain
> off, then we will need an explicit timer to match.

See above. I thought there existed an amount of time that we could wait
which would guarantee the retire handler would have run.

> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx