[PATCH 01/12] drm/i915: Assert mutex_is_locked on context lookup

daniel at ffwll.ch (Daniel Vetter) · Mon, 6 May 2013 11:44:22 +0200

On Mon, May 06, 2013 at 11:40:06AM +0200, Daniel Vetter wrote:
> On Thu, May 02, 2013 at 01:27:32PM -0700, Jesse Barnes wrote:
> > On Tue, 23 Apr 2013 23:15:29 -0700
> > Ben Widawsky <ben at bwidawsk.net> wrote:
> > 
> > > Because our context refcounting doesn't grab a ref at lookup time, it is
> > > unsafe to do so without the lock.
> > > 
> > > NOTE: We don't have an easy way to put the assertion in the lookup
> > > function which is where this really belongs. Context switching is good
> > > enough because it actually asserts even more correctness by protecting
> > > the default_context.
> > > 
> > > Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem_context.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > > index a1e8ecb..411ace0 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > > @@ -444,6 +444,8 @@ int i915_switch_context(struct intel_ring_buffer *ring,
> > >  	if (dev_priv->hw_contexts_disabled)
> > >  		return 0;
> > >  
> > > +	BUG_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
> > > +
> > >  	if (ring != &dev_priv->ring[RCS])
> > >  		return 0;
> > >  
> > 
> > Simple enough.
> > 
> > Reviewed-by: Jesse Barnes <jbarnes at virtuousgeek.org>
> > 
> > We usually do WARN_ONs for this stuff though, in case a user actually
> > does hit it, it may not be fatal so why crash the machine?
> > 
> > But that's a minor distinction since we shouldn't hit this except in
> > development anyway.
> 
> Well, since this is a patch for upstream the focus should very much be on
> supporting bug reporters and not developers. And for bug reporters a BUG
> is much more annoying than a WARN and greatly reduces the chances that
> we'll get a bug report.

Some more details why a WARN massively beats a BUG for us: BUG kills the
current process and ensures all locks are stuck. Usually that means X is
dead and you can't vt-switch away to the console to take a quick look at
dmesg.

Now even when all rendering is down the toilet due to the follow-up damage
after the WARN and the gpu a zombie, there's a non-zero chance that
vt-switch (or sw rendering in X) will work long enough to grab log files
and debugfs data.

Hence the first rule to only use a BUG on if we have a guaranteed OOPS
otherwise (which again will kill the process and make all locks stuck).
-Daniel

> 
> There are imo only very few cases where a BUG instead of a WARN is
> justified:
> - The kernel is _guaranteed_ to oops in the next few lines anyway, so a
>   BUG_ON will help in readability of the backtrace. Note that checking 3
>   different things for non-NULL in the same BUG actually reduces OOPS
>   readability (with an oops you can at least reconstruct the faulting
>   address and so probably the pointer). Also, this means the BUG should
>   have a neat description of what exactly blew up.
> 
>   The "a few lines" part is just a guideline with some big exceptions. A
>   prime example is refcount over/underflows since those will blow up, but
>   only sometimes later (and usually no one will have a clue why).
> 
> - BUGs are justified if there's a potential security hole awaiting, e.g.
>   when something in the userspace input validation has gone wrong.
> 
> - I'm wary of special error handling for WARNs, but if an early return
>   (with an error code if possible) transforms a BUG into a WARN I'm in.
>   But trying to fix up e.g. modeset state is usually futile, since we'll
>   end up with a black/fuzzy/corrupted screen most likely anyway.
> 
> But the most important rule is: In case of doubt, just WARN, don't BUG.
> 
> [I know, I violate it sometimes, too.]
> 
> Patch applied with the s/BUG/WARN bikeshed.
> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch