Re: [PATCH i-g-t] tests/gem_reset_stats.c: fix "ban" tests with scheduler

Daniel Vetter <daniel@xxxxxxxx> · Mon, 13 Jul 2015 17:00:55 +0200

On Mon, Jul 13, 2015 at 10:09:59AM +0000, Gore, Tim wrote:
> 
> 
> Tim Gore 
> Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ
> 
> 
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@xxxxxxxx] On Behalf Of Daniel
> > Vetter
> > Sent: Monday, July 13, 2015 10:35 AM
> > To: Gore, Tim
> > Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx; Wood, Thomas; Kuoppala, Mika
> > Subject: Re:  [PATCH i-g-t] tests/gem_reset_stats.c: fix "ban" tests
> > with scheduler
> > 
> > On Fri, Jul 10, 2015 at 02:26:59PM +0100, tim.gore@xxxxxxxxx wrote:
> > > From: Tim Gore <tim.gore@xxxxxxxxx>
> > >
> > > The tests for context banning fail when the gpu scheduler is enabled.
> > > The test causes a hang (using an infinite loop
> > > batch) and then queues up some work behind it on both the hanging
> > > context and also on a second "good" context. On the "good" context it
> > > queues up 2 batch buffers. After the hanging ring has been reset (not
> > > a full gpu reset) the test checks the values of batch_active and
> > > batch_pending returned by the i915_get_reset_stats_ioctl. For the
> > > "good"
> > > context it expects to see batch_pending == 2, because two batch
> > > buffers we queued up behind the hang on this context. But, with the
> > > scheduler enabled (android, gen8), one of these batch buffers is still
> > > waiting in the scheduler and has not made it as far as the
> > > ring->request_list, so this batch buffer is unaffected by
> > > the ring reset, and batch_pending is only 1.
> > >
> > > I considered putting in a test for the scheduler being enabled, but
> > > decided that a simpler solution is to only queue up 1 batch buffer on
> > > the good context. This does not change the test logic in any way and
> > > ensures that we should always have batch_pending=1, with or without
> > > the scheduler.
> > 
> > For the scheduler/tdr we probably need to split this up into two testcases
> > each:
> > - one where the 2nd batch depends upon the first (cross-context depency).
> > - one where the 2nd batch doesn't depend upon the first (should execute
> >   unhampered with scheduler/tdr).
> > 
> > Since we don't yet have a scheduler/tdr both of these will result in the same
> > outcome (since there's always the temporal depency). But with your patch
> > you only test the 2nd case (and incompletely, we should assert that the 2nd
> > batch did run) and ignore the first case.
> > 
> > In short this failure here is telling you that your test coverage for these
> > features is lacking. The correct fix is not to mangle the existing, but fix it up to
> > correctly cover the new expectations in all cases. And that should be done as
> > part of the tdr/scheduler submission.
> > -Daniel
> > 
> Should gem_rest_stats be testing the operation of the scheduler? I would have
> thought that the scheduler operation should have its own test(s). Gem_reset_stats
> is just about the reset mechanism and the stats collected. I can add this test, just
> seems like the wrong place.

Well gem_reset_stats assumes that a hang will victimize all subsequent
batches, irrespective of context. tdr/scheduler change that rather
fundamentally (which is the entire point really, at least of tdr). So yeah
we need to adjust those existing testcase.

And I think it's clearer if we do that by changing the existing
test-cases, that way the impact on existing features for new code is much
clearer.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx