Re: [PATCH] drm/i915: Use hash tables for the command parser

Daniel Vetter <daniel@xxxxxxxx> · Mon, 12 May 2014 19:47:06 +0200

On Mon, May 12, 2014 at 09:41:02AM -0700, Volkin, Bradley D wrote:
> On Mon, May 12, 2014 at 09:24:06AM -0700, Daniel Vetter wrote:
> > On Thu, May 08, 2014 at 09:02:18AM -0700, Volkin, Bradley D wrote:
> > > On Thu, May 08, 2014 at 08:45:07AM -0700, Ville Syrjälä wrote:
> > > > On Thu, May 08, 2014 at 08:27:16AM -0700, Volkin, Bradley D wrote:
> > > > > On Thu, May 08, 2014 at 02:56:05AM -0700, Tvrtko Ursulin wrote:
> > > > > > 
> > > > > > Hi Brad,
> > > > > > 
> > > > > > On 04/28/2014 04:22 PM, bradley.d.volkin@xxxxxxxxx wrote:
> > > > > > [snip]
> > > > > > > -	BUG_ON(!validate_cmds_sorted(ring));
> > > > > > > +	BUG_ON(!validate_cmds_sorted(ring, cmd_tables, cmd_table_count));
> > > > > > >   	BUG_ON(!validate_regs_sorted(ring));
> > > > > > > +
> > > > > > > +	BUG_ON(init_hash_table(ring, cmd_tables, cmd_table_count));
> > > > > > 
> > > > > > Is a BUG_ON a bit harsh since the above fails only on ENOMEM condition?
> > > > > > 
> > > > > > If the concern is not allowing any command execution if parser setup has 
> > > > > > failed, it would be nicer to the system as whole to just keep rejecting 
> > > > > > everything, but let the rest of the kernel live to enable debug or whatever?
> > > > > > 
> > > > > > I know it won't happen almost ever so it's a minor point really. I just 
> > > > > > dislike actively hosing the whole system if it is avoidable.
> > > > > 
> > > > > Hi Tvrtko,
> > > > > 
> > > > > I agree that a BUG_ON might be harsh here. I suppose we could log an
> > > > > error and disable the command parser. Most command buffers would
> > > > > still go through fine but HW parsing would reject some that the SW
> > > > > parser might otherwise allow. That could be a bit odd if we ever did
> > > > > get a failure - apps/functionality that worked the last time I booted
> > > > > suddenly don't this time. The issue would be in the log though.
> > > > > 
> > > > > I don't have a strong preference on this. Whatever people prefer.
> > > > 
> > > > If the memory allocation fails there's probably not much point in
> > > > trying to limp along and continue the driver init. So just pass error
> > > > up and let the caller deal with it.
> > > > 
> > > > Looking at the error paths up from ring init, we probably leak a ton
> > > > of junk but at least the kernel should remain otherwise operational.
> > > 
> > > Passing the error up is probably a good option. There was a recent change
> > > to mark the GPU as wedged if ring init fails, which seems like a reasonable
> > > enough response to parser init failure. I'll have to look at the cleanup
> > > paths. The per-ring parser init can probably move around to avoid excessive
> > > additional cleanup code on failure paths.
> > 
> > With the latest ring init rework from Chris we should just fail the gem
> > side of things and declare the gpu terminally wedged. But the modeset side
> > should keep on working, which means users will be able to take a look at
> > dmesg instead of a black screen (since that's what you get from BUG_ON in
> > driver load often).
> > 
> > Working modeset with broken gpu is infinitely better than a black screen,
> > so no BUG_ON here please. At least if the error code isn't much more than
> > a simply return, we always have to balance bugs in the error code with
> > better abilities to limp on long enough for a useful bug report.
> 
> Ok. I sent a v2 and in that I have i915_cmd_parser_init_ring() pass the
> error returned by init_hash_table() back to the caller, who passes it up,
> etc. When I looked more closely at Chris' change, the wedge behavior only
> happens if the return is -EIO, which it will not be in this case. The only
> failure from init_hash_table() is -ENOMEM so if we specifically want the
> wedging behavior (vs whatever would normally happen if ring init fails with
> -ENOMEM) then we need to modify the return from i915_cmd_parser_init_ring()
> to be -EIO. Probably also add a comment explaining why we do the modification.
> 
> I'm fine either way. The feedback earlier seemed to be that if we have an
> allocation failure we probably can't do much else anyhow, hence just returning
> the -ENOMEM.

Oh right. Since -ENOMEM should really be extremely exceptional I guess the
current code is ok. I've mixed things up a bit ;-)

But generally if the hw goes bananas we should continue to limp alone, if
it makes sense.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx