On Mon, May 12, 2014 at 09:41:02AM -0700, Volkin, Bradley D wrote: > On Mon, May 12, 2014 at 09:24:06AM -0700, Daniel Vetter wrote: > > On Thu, May 08, 2014 at 09:02:18AM -0700, Volkin, Bradley D wrote: > > > On Thu, May 08, 2014 at 08:45:07AM -0700, Ville Syrjälä wrote: > > > > On Thu, May 08, 2014 at 08:27:16AM -0700, Volkin, Bradley D wrote: > > > > > On Thu, May 08, 2014 at 02:56:05AM -0700, Tvrtko Ursulin wrote: > > > > > > > > > > > > Hi Brad, > > > > > > > > > > > > On 04/28/2014 04:22 PM, bradley.d.volkin@xxxxxxxxx wrote: > > > > > > [snip] > > > > > > > - BUG_ON(!validate_cmds_sorted(ring)); > > > > > > > + BUG_ON(!validate_cmds_sorted(ring, cmd_tables, cmd_table_count)); > > > > > > > BUG_ON(!validate_regs_sorted(ring)); > > > > > > > + > > > > > > > + BUG_ON(init_hash_table(ring, cmd_tables, cmd_table_count)); > > > > > > > > > > > > Is a BUG_ON a bit harsh since the above fails only on ENOMEM condition? > > > > > > > > > > > > If the concern is not allowing any command execution if parser setup has > > > > > > failed, it would be nicer to the system as whole to just keep rejecting > > > > > > everything, but let the rest of the kernel live to enable debug or whatever? > > > > > > > > > > > > I know it won't happen almost ever so it's a minor point really. I just > > > > > > dislike actively hosing the whole system if it is avoidable. > > > > > > > > > > Hi Tvrtko, > > > > > > > > > > I agree that a BUG_ON might be harsh here. I suppose we could log an > > > > > error and disable the command parser. Most command buffers would > > > > > still go through fine but HW parsing would reject some that the SW > > > > > parser might otherwise allow. That could be a bit odd if we ever did > > > > > get a failure - apps/functionality that worked the last time I booted > > > > > suddenly don't this time. The issue would be in the log though. > > > > > > > > > > I don't have a strong preference on this. Whatever people prefer. > > > > > > > > If the memory allocation fails there's probably not much point in > > > > trying to limp along and continue the driver init. So just pass error > > > > up and let the caller deal with it. > > > > > > > > Looking at the error paths up from ring init, we probably leak a ton > > > > of junk but at least the kernel should remain otherwise operational. > > > > > > Passing the error up is probably a good option. There was a recent change > > > to mark the GPU as wedged if ring init fails, which seems like a reasonable > > > enough response to parser init failure. I'll have to look at the cleanup > > > paths. The per-ring parser init can probably move around to avoid excessive > > > additional cleanup code on failure paths. > > > > With the latest ring init rework from Chris we should just fail the gem > > side of things and declare the gpu terminally wedged. But the modeset side > > should keep on working, which means users will be able to take a look at > > dmesg instead of a black screen (since that's what you get from BUG_ON in > > driver load often). > > > > Working modeset with broken gpu is infinitely better than a black screen, > > so no BUG_ON here please. At least if the error code isn't much more than > > a simply return, we always have to balance bugs in the error code with > > better abilities to limp on long enough for a useful bug report. > > Ok. I sent a v2 and in that I have i915_cmd_parser_init_ring() pass the > error returned by init_hash_table() back to the caller, who passes it up, > etc. When I looked more closely at Chris' change, the wedge behavior only > happens if the return is -EIO, which it will not be in this case. The only > failure from init_hash_table() is -ENOMEM so if we specifically want the > wedging behavior (vs whatever would normally happen if ring init fails with > -ENOMEM) then we need to modify the return from i915_cmd_parser_init_ring() > to be -EIO. Probably also add a comment explaining why we do the modification. > > I'm fine either way. The feedback earlier seemed to be that if we have an > allocation failure we probably can't do much else anyhow, hence just returning > the -ENOMEM. Oh right. Since -ENOMEM should really be extremely exceptional I guess the current code is ok. I've mixed things up a bit ;-) But generally if the hw goes bananas we should continue to limp alone, if it makes sense. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx