On Mon, Jul 13, 2015 at 04:07:14PM +0000, Gore, Tim wrote: > > > Tim Gore > Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ > > > > -----Original Message----- > > From: Daniel Vetter [mailto:daniel.vetter@xxxxxxxx] On Behalf Of Daniel > > Vetter > > Sent: Monday, July 13, 2015 3:59 PM > > To: Gore, Tim > > Cc: Daniel Vetter; intel-gfx@xxxxxxxxxxxxxxxxxxxxx; Wood, Thomas > > Subject: Re: [PATCH i-g-t] lib/igt_gt.c : allow changes to stop_rings > > mode bits > > > > On Mon, Jul 13, 2015 at 09:43:11AM +0000, Gore, Tim wrote: > > > > > > > > > Tim Gore > > > Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon > > > SN3 1RJ > > > > > > > -----Original Message----- > > > > From: Daniel Vetter [mailto:daniel.vetter@xxxxxxxx] On Behalf Of > > > > Daniel Vetter > > > > Sent: Monday, July 13, 2015 10:30 AM > > > > To: Gore, Tim > > > > Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx; Wood, Thomas > > > > Subject: Re: [PATCH i-g-t] lib/igt_gt.c : allow changes > > > > to stop_rings mode bits > > > > > > > > On Fri, Jul 10, 2015 at 02:06:06PM +0100, tim.gore@xxxxxxxxx wrote: > > > > > From: Tim Gore <tim.gore@xxxxxxxxx> > > > > > > > > > > In function igt_set_stop_rings, the test > > > > > igt_assert_f(flags == 0 || current == 0, .. > > > > > > > > > > will fail if we are trying to force a hang but the > > > > > STOP_RINGS_ALLOW_BAN or STOP_RINGS_ALLOW_ERROR bit is set. > > > > > With the introduction of per ring resets in the driver (in > > > > > android) these bits do not get cleared to zero when an individual > > > > > ring is reset. This causes subsequent attempt to cause a ring hang > > > > > via this function to fail, leading to several igt tests failing > > > > > (ie gem_reset_stats subtest ban-xxx etc). > > > > > > > > Fix tdr to reset these instead? > > > > -Daniel > > > > > > > I could change tdr, but why. When the TDR handles a ring hang and > > > resets the ring, why would it modify the flag that defines if the > > > driver should ban a frequently hanging context? If we get rid of the > > > stop_rings interface, as Chris Wilson suggested, we would still need > > > to keep the STOP_RING_ALLOW_BAN/ALLOW_ERRORS bits in debugfs, but > > you > > > would not expect to have to re-write these bits each time there is a ring > > reset. > > > > The fix current hang recover code to no reset this, add some grace period, > > then push this patch to igt. We don't have full-blown abi guarantees for > > debugfs/igt stuff, but I want at least a few months (really last released > > kernel&igt) of backwards/forward compatibility. And inconsistent behaviour > > isn't great imo. > > -Daniel > > Sorry Daniel, I didn't really follow that. > I didn't want a gpu reset to clear the ALLOW_BAN bit, since this will defeat the > point of this bit, except perhaps in test situations where you can keep setting it > each time you deliberately cause a hang. It seems like the ALLOW_BAN bit has > uses in real world situations, although I don't know it anyone uses it. ALLOW_BAN is only for testing. It's meant to re-enable auto-banning because we disable that by default for automated tests. This is all meant to be used together with the stop_rings stuff only. > Would you be more comfortable with > Igt_assert_f ( ( flags ==0 ) || > (( current & STOP_RING_ALL) ==0) && ((current ^ flags) & ~ STOP_RING_ALL == 0 ) ) > > So the either the new flags must be 0 (currently allowed) or the existing flags must > indicate that all hangs are cleared (0 except possibly the mode bits) AND the mode > bits you are writing are the same as the current values. ?? Iirc gem_reset_stats uses stop_rings only to supress gpu reset warnings in dmesg (since they're expected) and not for the actual stop_rings logic. It sets stop_rings after submitting the hanging batch, but before that one is detected as hung. We could just add another bit to stop_rings for that case. What I still don't understand is why tdr can't just keep on properly resetting stop_rings. It'll break tons of existing tests, and I don't understand the upside. stop_rings has become quasi-abi (that's why the ALLOW bits have such funky semantics, it's for backwards compat), if you need to change it you need to extend it, not break it. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx