On Tue, Mar 25, 2014 at 04:36:05PM +0100, Daniel Vetter wrote: > On Tue, Mar 25, 2014 at 08:15:54AM +0000, Chris Wilson wrote: > > On Tue, Mar 25, 2014 at 09:07:00AM +0100, Daniel Vetter wrote: > > > On Tue, Mar 25, 2014 at 9:03 AM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > > > > If we try to execute on a known ring, but it has failed to be > > > > initialised correctly, report that the GPU is hung rather than the > > > > command invalid. This leaves us reporting EINVAL only if the user > > > > requests execution on a ring that is not supported by the device. > > > > > > > > This should prevent UXA from getting stuck in a null render loop after a > > > > failed resume. > > > > > > > > Reported-by: Jiri Kosina <jikos@xxxxxxxx> > > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=76554 > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > > > > Feels a bit like duct-tape ... Shouldn't we instead stop tearing down > > > ringbuffer structures over suspend/resume? And maybe the same for init > > > with your patch applied. > > > > Even if we did, we would still end up with g45 failing to restart > > the ring at random, so we need some w/a. > > > > > Or we simply check for wedged first thing in execbuf ... > > > > See the first 2 patches ;-) The first is actually essential as we have > > no other guard against writing into the non-existent ring. > > > > I thought about doing that. However, I prefer doing arg validation > > first i.e. get all (or as much as is feasible) of the EINVAL checks out > > of the way so that we avoid touching hardware or leaking any information > > to a malicious client. The problem in this case is where is not actually > > an invalid arg. > > > > Note that we will detect the EIO later before touching hardware (so long > > as the first two patches are in place). > > Yeah I've seen the other patches. I think we should try to keep all the > ring structures around even when the hw init failed. I've made some feeble > attempts a while ago to split the structure init from the hw init stuff, > but kinda never fully materialized ... > > Imo if our set of valid rings semi-randomly changes at runtime even, > that's not good. Agreed, but sadly we can't trust hardware to always work, and we need something to prevent explosions. I quite like the idea of marking the GPU wedged if hw init fails so that we lose acceleration but keep modesetting around. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx