On Thu, Jun 13, 2013 at 11:53:13AM +0200, Daniel Vetter wrote: > On Wed, Jun 12, 2013 at 11:44:51PM +0100, Chris Wilson wrote: > > On Wed, Jun 12, 2013 at 02:13:26PM -0700, Ben Widawsky wrote: > > > On Wed, Jun 12, 2013 at 12:35:28PM +0300, Mika Kuoppala wrote: > > > > To count context losses, add struct i915_ctx_hang_stats for > > > > both i915_hw_context and drm_i915_file_private. > > > > drm_i915_file_private is used when there is no context. > > > > > > > > v2: renamed and cleaned up the struct (Chris Wilson, Ian Romanick) > > > > > > > > Signed-off-by: Mika Kuoppala <mika.kuoppala at intel.com> > > > > > > > I don't have time to do a proper review before Daniel wants to merge > > > these, and Chris has already reviewed it. > > > > > > 1-6 are: > > > Acked-by: Ben Widawsky <ben at bwidawsk.net> > > > > > > I don't really like the behavior of 7. At least, I'd like to make it > > > something that can be disabled via debugfs, sysfs, or module parameter. > > > (I'd very much prefer it to be opt-in also). TBH , I only read it very > > > fast, and I'm not horribly opposed to it, just a bunch of complexity for > > > IMO little gain. Presumably the problem it's trying to solve should be > > > fixed with a fix to ddx, mesa, libva, client, whatever. In the embedded > > > case, the same thing applies. Banning the guilty doesn't make the user > > > experience any better. So the only thing I see is DoS, but we've never > > > *really* made that our priority anyway, so, meh. > > > > Right, it is policy. But it is existing policy. Ultimately we want to > > get as much of that decision out of the kernel. > > Merged the entire series with a little note added to this patch explaining > the justification for it. Thanks for patches and review. I think a careful review of the locking would be good here. I see a few cases: - dev_priv->hangcheck stats which are only touched by the hangcheck timer ever. Safe since the hangcheck timer is non-reentrant, but would be good to add a comment to those things. - Data shared between hangcheck and reset work. Probably needs spinlock protection, I'd add a new one. I know that it's rather unlikely that we'll race, but if we e.g. increase the hangcheck rate a lot we might fire the next hangcheck before the reset work has completed. Or the scheduler could be really annoying and not give the work cpu time for a while. - Guilty context stats in the context structs. Atm this is protected by dev->struct_mutex I think, but I'd like to not proliferate the use of that lock if possible (there's _way_ too much legacy bagadge attached to this thing). So I'd vote for the introduction of a new mutex. - Anyting else I've missed in my quick review. I think a patch for each class of data explaining how it's accessed and how it's protected exactly (both in the commit message and with a short comment in the headers) would be really fine. Volunteered? Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch