Re: [PATCH i-g-t] tests/gem_reset_stats.c: prepare for per engine resets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Tim Gore 
Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ


> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@xxxxxxxx] On Behalf Of Daniel
> Vetter
> Sent: Wednesday, November 18, 2015 2:12 PM
> To: Gore, Tim
> Cc: Daniel Vetter; intel-gfx@xxxxxxxxxxxxxxxxxxxxx;
> mika.kuoppala@xxxxxxxxx; Wood, Thomas
> Subject: Re:  [PATCH i-g-t] tests/gem_reset_stats.c: prepare for
> per engine resets
> 
> On Wed, Nov 18, 2015 at 12:15:11PM +0000, Gore, Tim wrote:
> >
> >
> > Tim Gore
> > Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon
> > SN3 1RJ
> >
> >
> > > -----Original Message-----
> > > From: Daniel Vetter [mailto:daniel.vetter@xxxxxxxx] On Behalf Of
> > > Daniel Vetter
> > > Sent: Wednesday, November 18, 2015 11:54 AM
> > > To: Gore, Tim
> > > Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx; mika.kuoppala@xxxxxxxxx; Wood,
> > > Thomas
> > > Subject: Re:  [PATCH i-g-t] tests/gem_reset_stats.c:
> > > prepare for per engine resets
> > >
> > > On Wed, Nov 18, 2015 at 10:24:43AM +0000, tim.gore@xxxxxxxxx wrote:
> > > > From: Tim Gore <tim.gore@xxxxxxxxx>
> > > >
> > > > when checking to make sure that the driver has performed the
> > > > expected number of resets, this test looks at the reset_count,
> > > > which is incremented each time the GPU is reset. Upcoming changes
> > > > in the way GPU hangs are handled mean that in most cases (and in
> > > > all the cases in this
> > > > test) only a single GPU engine is reset which does not cause the
> > > > reset_count to be incremented. This is already causing this test
> > > > to fail on Android. In this case we can instead look at the
> > > > batch_active count which is also returned from the
> > > > i915_get_reset_stats_ioctl and is incremented by both a single engine
> reset and a full gpu reset.
> > > > There are differences between the reset_count and the batch_active
> > > > count, but for establishing that the correct number of resets have
> > > > occured either can be used.
> > > > This change enables this test to run successfully on Android and
> > > > will mean that the test does not break when the TDR patches get
> > > > merged into the uptream driver.
> > > >
> > > > Signed-off-by: Tim Gore <tim.gore@xxxxxxxxx>
> > >
> > > Why doesn't TDR just count resets correctly?
> > > -Daniel
> > >
> > It does. It just doesn't do gpu resets for most hangs. Instead it
> > resets as single command streamer which is much less intrusive and
> > loses the minimum amount of work.
> 
> Again: Why does TDR not count engine resets as resets? Afaiui that seems to
> be the problem you're working around here.
> -Daniel
> 
This was not intended as a workaround, just to align the test with the way TDR
is being implemented. I assume TDR chose not the alter the meaning of the 
reset_counter as this is implicitly part of the interface to userland. We can change
reset_counter to count both GPU resets and engine resets but I think this
may have knock on effects for the openGL robustness interface which uses this
Ioctl. I'll check with the openGL team as I know they are still working on this.

> >   Tim
> >
> > > > ---
> > > >  tests/gem_reset_stats.c | 41
> > > > ++++++++++++++++++++++++++---------------
> > > >  1 file changed, 26 insertions(+), 15 deletions(-)
> > > >
> > > > diff --git a/tests/gem_reset_stats.c b/tests/gem_reset_stats.c
> > > > index 4cbbb4e..5ec026f 100644
> > > > --- a/tests/gem_reset_stats.c
> > > > +++ b/tests/gem_reset_stats.c
> > > > @@ -104,9 +104,9 @@ static int gem_reset_stats(int fd, int ctx_id,
> > > >
> > > >  	rs->ctx_id = ctx_id;
> > > >  	rs->flags = 0;
> > > > -	rs->reset_count = rand();
> > > > -	rs->batch_active = rand();
> > > > -	rs->batch_pending = rand();
> > > > +	rs->reset_count = UINT32_MAX;
> > > > +	rs->batch_active = UINT32_MAX;
> > > > +	rs->batch_pending = UINT32_MAX;
> > > >  	rs->pad = 0;
> > > >
> > > >  	do {
> > > > @@ -690,6 +690,18 @@ static int get_reset_count(int fd, int ctx)
> > > >  	return rs.reset_count;
> > > >  }
> > > >
> > > > +static int get_active_count(int fd, int ctx) {
> > > > +	int ret;
> > > > +	struct local_drm_i915_reset_stats rs;
> > > > +
> > > > +	ret = gem_reset_stats(fd, ctx, &rs);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	return rs.batch_active;
> > > > +}
> > > > +
> > > >  static void test_close_pending_ctx(void)  {
> > > >  	int fd, h;
> > > > @@ -837,17 +849,16 @@ static void test_reset_count(const bool
> > > > create_ctx)
> > > >
> > > >  	assert_reset_status(fd, ctx, RS_NO_ERROR);
> > > >
> > > > -	c1 = get_reset_count(fd, ctx);
> > > > -	igt_assert(c1 >= 0);
> > > > +	c1 = get_active_count(fd, ctx);
> > > > +	igt_assert(c1 == 0);
> > > >
> > > >  	h = inject_hang(fd, ctx);
> > > >  	igt_assert_lte(0, h);
> > > >  	gem_sync(fd, h);
> > > >
> > > >  	assert_reset_status(fd, ctx, RS_BATCH_ACTIVE);
> > > > -	c2 = get_reset_count(fd, ctx);
> > > > -	igt_assert(c2 >= 0);
> > > > -	igt_assert(c2 == (c1 + 1));
> > > > +	c2 = get_active_count(fd, ctx);
> > > > +	igt_assert(c2 == 1);
> > > >
> > > >  	igt_fork(child, 1) {
> > > >  		igt_drop_root();
> > > > @@ -877,9 +888,9 @@ static int _test_params(int fd, int ctx,
> > > > uint32_t flags, uint32_t pad)
> > > >
> > > >  	rs.ctx_id = ctx;
> > > >  	rs.flags = flags;
> > > > -	rs.reset_count = rand();
> > > > -	rs.batch_active = rand();
> > > > -	rs.batch_pending = rand();
> > > > +	rs.reset_count = UINT32_MAX;
> > > > +	rs.batch_active = UINT32_MAX;
> > > > +	rs.batch_pending = UINT32_MAX;
> > > >  	rs.pad = pad;
> > > >
> > > >  	do {
> > > > @@ -976,14 +987,14 @@ static void defer_hangcheck(int ring_num)
> > > >
> > > >  	igt_skip_on(next_ring == current_ring);
> > > >
> > > > -	count_start = get_reset_count(fd, 0);
> > > > -	igt_assert_lte(0, count_start);
> > > > +	count_start = get_active_count(fd, 0);
> > > > +	igt_assert(count_start == 0);
> > > >
> > > >  	igt_assert(inject_hang_ring(fd, 0, current_ring->exec, true));
> > > >  	while (--seconds) {
> > > >  		igt_assert(exec_valid_ring(fd, 0, next_ring->exec));
> > > >
> > > > -		count_end = get_reset_count(fd, 0);
> > > > +		count_end = get_active_count(fd, 0);
> > > >  		igt_assert_lte(0, count_end);
> > > >
> > > >  		if (count_end > count_start)
> > > > @@ -992,7 +1003,7 @@ static void defer_hangcheck(int ring_num)
> > > >  		sleep(1);
> > > >  	}
> > > >
> > > > -	igt_assert_lt(count_start, count_end);
> > > > +	igt_assert(count_end == 1);
> > > >
> > > >  	close(fd);
> > > >  }
> > > > --
> > > > 1.9.1
> > > >
> > > > _______________________________________________
> > > > Intel-gfx mailing list
> > > > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> > > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation http://blog.ffwll.ch
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux