On Tue, Jul 3, 2012 at 5:59 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote: > My experience with these patches is that they make it less likely that > the hang is reported to the userspace in a timely fashion (filling the > ring full leads to lots of lost rendering) and worse make it much more > likely that i915_gem_fault() hits an EIO and goes bang. That is > unacceptable and trivial to hit with these patches. I have not yet > reproduced that issue using the same broken renderer without these > patches. Hm, I don't see how these patches allow much more rendering to be queued up until we stop everything - for single-threaded userspace that should be at most one additional batch (which might have been the one that could catch the spurious -EIO). All subsequent execbuf ioctl calls should stall when trying to grab the mutex. Same for the case that the gpu reset failed, userspace should be able to submit one more batch afaict until it gets an -EIO. So can you please dig into what exactly your seeing a bit more and unconfuse me? > I do think the patches are a step in the right direction, but with the > change in userspace behaviour it has to be a NAK for the time being. Ok, I guess I'll have to throw the -EIO sigbus eater into the mix, too. After all userspace is supposed to call set_domain(GTT) before accessing the gtt mmap, so it should still get notice when the gpu has died and rendering might be incomplete. Imo we should still return -EIO for all ioctls, userspace should be able to cope with these (In the future we might even add optional behaviour to signal potentially dropped rendering due to a gpu reset at wait_rendering for userspace that really cares). So would the sigbus eater be good enough or do we need more? Thanks, Daniel -- Daniel Vetter daniel.vetter at ffwll.ch - +41 (0) 79 365 57 48 - http://blog.ffwll.ch