On Thu, Nov 26, 2015 at 12:59:37PM +0000, Chris Wilson wrote: > On Thu, Nov 26, 2015 at 12:34:35PM +0100, Daniel Vetter wrote: > > Since $debugfs/i915_wedged restores a wedged gpu by using a normal gpu > > hang we need to be careful to not run into the "hanging too fast > > check": > > > > - don't restore the ban period, but instead keep it at 0. > > - make sure we idle the gpu fully before hanging it again (wait > > subtest missted that). > > > > With this gem_eio works now reliable even when I don't run the > > subtests individually. > > > > Of course it's a bit fishy that the default ctx gets blamed for > > essentially doing nothing, but until that's figured out in upstream > > it's better to make the test work for now. > > This used to be reliable. And just disabling all banning in the kernel > forever more is silly. > > During igt_post_hang_ring: > 1. we wait upon the hanging batch > - this returns when hangcheck fires > 2. reset the ban period to normal > - this takes mutex_lock_interruptible and so must wait for the reset > handler to run before it can make the change, > - ergo the hanging batch never triggers a ban for itself. > - (a subsequent nonsimulated GPU hang may trigger the ban though) This isn't where it dies. It dies when we do the echo 1 > i915_wedged. I suspect quiescent_gpu or whatever is getting in the way, but I really only wanted to get things to run first. And since i915_wedged is a developer feature, and it does work perfectly if you don't intend to reuse contexts I didn't see any point in first trying to fix it up. So I still maintain that this is a good enough approach, at least if there's no obvious fix in-flight already. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx