Quoting Daniele Ceraolo Spurio (2019-09-09 17:27:47) > > > On 9/7/19 1:39 AM, Chris Wilson wrote: > > Quoting Daniele Ceraolo Spurio (2019-09-06 23:28:05) > >> > >> > >> On 9/5/19 2:09 AM, Janusz Krzysztofik wrote: > >>> When trying to reset a device with reset capability disabled or not > >>> supported while rings are full of requests, it has been observed when > >>> running in execlists submission mode that command stream buffer tail > >>> tends to be incremented by apparently still running GPU regardless of > >>> all requests being already cancelled and command stream buffer pointers > >>> reset. As a result, kernel panic on NULL pointer dereference occurs > >>> when a trace_ports() helper is called with command stream buffer tail > >>> incremented but request pointers being NULL during final > >>> __intel_gt_set_wedged() operation called from intel_gt_reset(). > >>> > >>> Skip actual reset procedure if reset is disabled or not supported. > >> > >> This last sentence is a bit confusing. You're not skipping the reset > >> procedure, you're skipping the attempt of unwedging and resetting again > >> after a reset & wedge already happened. > > > > Loss of email over the last week, so jumping in at the end. My gut > > response is that this is still just papering over the bug, as what you > > say above makes no sense. > > -Chris > > > > The issue here is that if we don't reset the HW when we wedge, whatever > was running on the engines might complete at any point after that, which > generates an unexpected post-wedge CSB event that we don't handle > gracefully when we unwedge. Indeed, until we call reset_default_submission all those unexpected interrupts are redirected to the nop_submission_tasklet. I think it should be more along the lines of struct intel_timeline *tl; unsigned long flags; + bool ok; if (!test_bit(I915_WEDGED, >->reset.flags)) return true; @@ -838,7 +839,11 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt) } spin_unlock_irqrestore(&timelines->lock, flags); - intel_gt_sanitize(gt, false); + ok = false; + if (!INTEL_INFO(gt->i915)->gpu_reset_clobbers_display) + ok = __intel_gt_reset(gt, ALL_ENGINES) == 0; + if (!ok) + return false; /* * Undo nop_submit_request. We prevent all new i915 requests from For bonus points, gpu_reset_clobbers_display should take into account whether the display is active. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx