On Mon, Jun 25, 2012 at 11:06 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote: > On Mon, 25 Jun 2012 22:49:03 +0200, Daniel Vetter <daniel at ffwll.ch> wrote: >> On Mon, Jun 25, 2012 at 09:32:23PM +0100, Chris Wilson wrote: >> > It looks like the patch to reuse check_wedge() should be first as it is >> > the common theme in the series. >> >> Hm, actually I think I'll smash the check_wedge into the last patch. With >> that change, this patch would solely be about not returning spurious -EIO, >> whereas the last patch would be solely about not returning -EAGAIN in >> cases we can't handle. Does that make some sense? > > The split sounds reasonable, grouping the patch in that manner should > give a better story. My only holdout is that I don't want to lose the > papering in i915_reset(). Hm, I'm not yet convinced on the quality of that ductape. I've went with the unconditional mutex_lock, deadlocks be damned, approach because: - QA has a machine that seemingly _always_ hits this problem. Here it depends upon the machine and ranges from "fails after a few hundred reset cycles" to "fails after 5 runs at most". - Eric complained that when developing new userspace driver code the gpu gets wedged every once in a while resetting his gpu pretty much after every compile&run cycle. So to make life easier for QA and userspace driver devs I've opted to make it succed (or deadlock), and presuming the kernel code actually works. Now kernel devs obviously prefer to blows things up with an OOPS or two, so I see that we should have some balance. But even there my thinking is that waiting for the stuck process backtrace is better than trying to paper over severe issues when the systems has clearly lost its mind already. So essentially I still fail to see the upside of your proposed ductape ... In either case I guess a walk to the reset button is inevitable every once in a while ;-) Yours, Daniel -- Daniel Vetter daniel.vetter at ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch