On Mon, 2013-11-11 at 16:31 +0100, Daniel Vetter wrote: > On Mon, Nov 11, 2013 at 02:58:31PM +0000, Siluvery, Arun wrote: > > From: "Siluvery, Arun" <arun.siluvery@xxxxxxxxx> > > > > This patchset contains changes for Timeout detection and recovery (TDR) which > > provides per-engine hang detection and recovery. > > The current driver performs full gpu reset in case of a hang, TDR attempts to > > only reset the engine that is hung and it falls back to full reset if it fails. > > > > Full GPU reset can leave the system in a state where the display updates > > intermittently and possibly lock-up depending on the work load at the time of > > hang. TDR can help recover the system in those case thus increasing the stability. > > Are these hw lockups you've seen with full gpu reset or just kernel > deadlocks? If it's the latter we've recently (re-)fixed a bunch of those, > and if there are new ones we definitely want to fix them and add testcases > to igt. So if you could share some of these hangs and their > analysis/testcases that's be very interesting. > > That's of course on top of any other reset improvements. I think these are kernel lockups, unfortunately when this happens there is no response from the kernel, sending break is also not helping. I will try to get more details on this. > > > The changes are split in multiple patches. > > 1. Ring utility functions to save/restore context, reset ring etc > > 2. TDR hang detection logic and error recovery function > > 3. Debugfs changes to export TDR statistics. > > > > I have tested these changes on drm-intel-nightly with simple test which > > inserts a bad batch buffer on the specific to trigger a hang. TDR logic > > then detects this and recovers from it by skipping the bad batch. > > I want this testcase (as a patch to igt). ok, I will send it to the mailing list. > > > Please review and give your comments. > > I'll try to have a look later this week, atm still busy with bdw > upstreaming. One more meta-comment though: Something with your git setup > seems to be broken, the patches don't have in-reply-to headers pointing at > this cover letter and hence the threading is a bit broken. ok thanks. yes my mistake I missed an option while generating the patches. Do you suggest resending all patches again? > > Cheers, Daniel > > > > regards > > Arun > > > > Siluvery, Arun (3): > > drm/1915: Add ring functions to save/restore context for per-ring > > reset > > drm/i915: Per-engine Timeout detection and recovery on HSW > > drm/i915: Export TDR hang count to debugfs > > > > drivers/gpu/drm/i915/i915_debugfs.c | 68 +++- > > drivers/gpu/drm/i915/i915_dma.c | 16 +- > > drivers/gpu/drm/i915/i915_drv.c | 195 +++++++++- > > drivers/gpu/drm/i915/i915_drv.h | 92 ++++- > > drivers/gpu/drm/i915/i915_gem.c | 77 +++- > > drivers/gpu/drm/i915/i915_gpu_error.c | 25 +- > > drivers/gpu/drm/i915/i915_irq.c | 556 ++++++++++++++++------------- > > drivers/gpu/drm/i915/i915_reg.h | 7 + > > drivers/gpu/drm/i915/intel_display.c | 25 +- > > drivers/gpu/drm/i915/intel_ringbuffer.c | 607 +++++++++++++++++++++++++++++++- > > drivers/gpu/drm/i915/intel_ringbuffer.h | 51 +++ > > drivers/gpu/drm/i915/intel_uncore.c | 31 +- > > include/drm/drmP.h | 7 + > > 13 files changed, 1467 insertions(+), 290 deletions(-) > > > > -- > > 1.8.4 > > > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx