On Tue, Dec 01, 2015 at 09:04:23AM +0000, Chris Wilson wrote: > On Tue, Dec 01, 2015 at 09:28:08AM +0100, Daniel Vetter wrote: > > On Mon, Nov 30, 2015 at 10:11:12AM +0000, Chris Wilson wrote: > > > On Thu, Nov 26, 2015 at 12:34:34PM +0100, Daniel Vetter wrote: > > > > So there's 3 competing proposals for what wait_ioctl should do wrt > > > > -EIO: > > > > > > > > - return -EIO when the gpu is wedged. Not terribly useful for > > > > userspace since it might race with a hang and then there's no > > > > guarantee that a subsequent execbuf won't end up in an -EIO. > > > > Terminally wedge really can only be reliably signalled at execbuf > > > > time, and userspace needs to cope with that (or decide not to > > > > bother). > > > > > > > > - EIO for any obj that suffered from a reset. This means big internal > > > > reorginazation in the kernel since currently we track reset stats > > > > per-ctx and not on the obj. That's also what arb robustness wants. > > > > We could do this, but this feels like new ABI territory with the > > > > usual userspace requirements and high hurdles. > > > > > > > > - No -EIO at all. Consistent with set_domain_ioctl and simplest to > > > > implement. Which is what this patch does. > > > > > > Since no one else is weighing into the ABI discussion, I'm happy with > > > losing EIO here. I thought it could be useful, but as no one is using or > > > seems likely to start using it, begone. > > > > > > > We can always opt to change this later on if there's a real need. > > > > > > > > To make the test really exercise this do a full wedged gpu hang, to > > > > make sure -EIO doesn't leak out at all. > > > > > > > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx> > > > > --- > > > > tests/gem_eio.c | 6 +++++- > > > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/tests/gem_eio.c b/tests/gem_eio.c > > > > index a24c8f1c53b5..8345d1a7a429 100644 > > > > --- a/tests/gem_eio.c > > > > +++ b/tests/gem_eio.c > > > > @@ -161,10 +161,14 @@ static void test_wait(int fd) > > > > { > > > > igt_hang_ring_t hang; > > > > > > > > + igt_require(i915_reset_control(false)); > > > > > > However, this is not required to test the ABI change above as the wait > > > itself will still hang, whether or not it wedges the GPU. > > > > Yes it's not strictly required, but without it the testcase is fairly > > boring. If we move the check_wedge out of wait_request then a normail gpu > > reset would always return 0 (after retrying a few times perhaps), so I > > figured testing the wedged case is the only one that's worth it. > > But wedging during the hang is also not interesting as we have no > opportunity to see the reset failure in the test case. Putting the GPU > into the wedged state before the wait, should be a trivial test that the > object is idle after the reset. Right now (with current kernels) we see an -EIO with this testcase instead of 0 in the wait. Without disabling reset we see 0 both on fixed and broken kernels. So I don't really see why not testing this case is a good idea? It's the one we're currently failing at and leak -EIO to userspace. Assuming ofc we still go with the "let's curb -EIO except for execbuf" ABI. > > Maybe we should dupe the subtests all and have wedged and non-wedged cases > > for all of them? That would imo make more sense. > > The others, what matters is how we handle the GPU being wedged before we > queue an execbuf or throttling. In terms of testing no error is reported > for the hanging case, we should add tests for set-domain (so that it is > explicit and not reliant on implementation inside lib/), GTT faulting > (that would need both wedged and hanging cases, but we are not likely to > be able to cover all the cases like waiting on fence and other secondary > waits) and throttle. Yeah there's more gaps than just the one above. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx