Re: [PATCH] tests/gem_eio: Disable reset for wait subtests

Daniel Vetter <daniel@xxxxxxxx> · Thu, 26 Nov 2015 11:41:26 +0100

On Thu, Nov 26, 2015 at 10:03:01AM +0000, Chris Wilson wrote:
> On Thu, Nov 26, 2015 at 09:36:14AM +0100, Daniel Vetter wrote:
> > On Wed, Nov 25, 2015 at 04:34:13PM +0000, Chris Wilson wrote:
> > > On Wed, Nov 25, 2015 at 04:29:01PM +0000, Chris Wilson wrote:
> > > > On Wed, Nov 25, 2015 at 04:58:19PM +0100, Daniel Vetter wrote:
> > > > > This testcase tries to validate -EIO behaviour by disabling gpu reset
> > > > > support in the kernel. Except that the wait subtest forgot to do that,
> > > > > and therefore gets a return value of 0 instead of the expected -EIO.
> > > > > 
> > > > 
> > > > Wrong. It was intentionally not using reset=false.
> > > 
> > > To be more precise, the reason here is that we are not wedging the GPU
> > > but the expectation is that a wait upon a request that hangs reports the
> > > hang. Since the wait on GPU activity is explicit in the ioctl, the
> > > presumption is that the user actually cares about that activity and so
> > > should be given greater information about how it completes (timeout, GPU
> > > hung, or success).
> > 
> > The only place we reprt hangs is in the reset_stats_ioctl. And
> > fundamentally wait_ioctl can't do (right now) what you want, since if the
> > reset recovery happens before userspace calls wait_ioctl then it will
> > happily return 0 for success.
> > 
> > So if you want this then we need:
> > - a reason for userspace to want this
> > - reorg all our reset handling and move (or well copy) the reset stats to
> >   every object
> 
> I buy that if it resets before the wait_ioctl then it is invisible,
> which makes the information incomplete. I still think that as a caller
> of wait_ioctl, knowing that my wait was broken by a reset is valuable
> information. The drivers use some form of waiting to throttle
> themselves, therefore it is a convenient way of having the information
> that a reset has occurred and take action. This usecase also wouldn't
> miss a reset happening.

Iirc we even discussed this when doing arb robustness, but the trouble is
that arb robusteness really wants its reset stats per context. Until that
changes (with fences or not) I think we should just curb the EIO from
wait_ioctl like in all the other places.

> You may disagree about wait_ioctl, but a wait_fence must definitely
> report EIO. (And this is as close to wait_fence as we currently have in
> the code and in use today.)

Hm, the plan for fence waiting is to use the android syncpt stuff. I
didn't check whether that bothers with reporting failures, but should be
simple to add. And definitely makes sense.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx