On Mon, Feb 13, 2017 at 12:34:26PM +0200, Mika Kuoppala wrote: > Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > > At least we now do the irq_barrier hammer once at the start in reset_prepare, > > so we should be better, but I'm wondering if we want to store the > > request from prepare and then double check in the actual reset. > > > > #1 store seqno from hangcheck > #2 get mutex for reset > #3 barrier > #4 find_request (only once) > #5 on prepare path, check the submachinery > against this req and if inconsistent, queue hangcheck > and return from prepare without resetting. > > ? Per-engine, or global voting. The issue I just found with the find_request once plan is that it does have to be after the reset / hw is truly idle. And then we should employ a really big hammer on top. The problem is that for legacy ringbuffer sumbission, we restart from the point of last retirement and so if our retirement is inaccurate we may replay one too many requests. Hmm. we may get corruption either way of course, so perhaps this is not as big an issue as I thought. Aborting the reset if we detect just before we do that the GPU recovered is definitely an improvement over my plan. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx