On Tue, Jan 19, 2016 at 05:01:00PM +0000, Arun Siluvery wrote: > On 19/01/2016 16:42, Daniel Vetter wrote: > >On Tue, Jan 19, 2016 at 03:04:09PM +0000, Arun Siluvery wrote: > >>On 19/01/2016 14:13, Chris Wilson wrote: > >>>On Tue, Jan 19, 2016 at 03:04:40PM +0100, Daniel Vetter wrote: > >>>>On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote: > >>>>>On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote: > >>>>>>On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote: > >>>>>>>Pending reset requests are cleared before suspending, they should be picked up > >>>>>>>after resume when new work is submitted. > >>>>>>> > >>>>>>>This is originally added as part of TDR patches for Gen8 from Tomas Elf which > >>>>>>>are under review, as suggested by Chris this is extracted as a separate patch > >>>>>>>as it can be useful now. > >>>>>>> > >>>>>>>Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > >>>>>>>Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>>>>>>Signed-off-by: Arun Siluvery <arun.siluvery@xxxxxxxxxxxxxxx> > >>>>>> > >>>>>>Pulling in the discussion we had from irc: Imo the right approach is to > >>>>>>simply wait for gpu reset to finish it's job. Since that could in turn > >>>>>>lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do > >>>>>>that in a loop around gem_idle. And drop dev->struct_mutex in-between. > >>>>>>E.g. > >>>>>> > >>>>>>while (busy) { > >>>>>> mutex_lock(); > >>>>>> gpu_idle(); > >>>>>> mutex_unlock(); > >>>>>> > >>>>>> flush_work(reset_work); > >>>>>>} > >>>>> > >>>>>Where does the requirement for gpu_idle come from? If there is a global > >>>>>reset in progress, it cannot queue a request to flush the work and > >>>>>waiting on the old results will be skipped. So just wait for the global > >>>>>reset to complete, i.e. flush_work(). > >>>> > >>>>Yes, but the global reset might in turn leave a wrecked gpu behind, or at > >>>>least a non-idle one. Hence another gpu_idle on top, to make sure. If we > >>>>change init_hw() of engines to be synchronous then we should have at least > >>>>a WARN_ON(not_idle_but_i_expected_so()); in there ... > >> > >>gpu_error.work is removed in b8d24a06568368076ebd5a858a011699a97bfa42, we > > > >git sha1 from your private tree are meaningless in the public. Either link > >to some git weburl or mailing lists archive link. > > It is from drm-intel repo, > http://cgit.freedesktop.org/drm-intel/commit/?id=b8d24a06568368076ebd5a858a011699a97bfa42 > > http://lists.freedesktop.org/archives/intel-gfx/2015-January/059154.html Oh right, forgot that this landed, sorry for the confusion. Summary of our irc discussion: We idle the gpu and flush the hangcheck (which should flush the reset work) so at least with current upstream there shouldn't be a bug. If there is a bug we need to understand it, we can't just add code without clear explanation and reasons: At best that confuses, at worst it hides some real bugs. -Daniel > > regards > Arun > > > > >Thanks, Daniel > > > >>are doing reset in hangcheck work itself so I think there is no need to > >>flush work. > >> > >>while (i915_reset_in_progress(gpu_error) && > >> !i915_terminally_wedged(gpu_error)) { > >> int ret; > >> > >> mutex_lock(&dev->struct_mutex); > >> ret = i915_gpu_idle(dev); > >> if (ret) > >> DRM_ERROR("GPU is in inconsistent state after reset\n"); > >> mutex_unlock(&dev->struct_mutex); > >>} > >> > >>If the reset is successful we are idle before suspend otherwise in a wedged > >>state. is this ok? > >> > >>regards > >>Arun > >> > >>> > >>>Does it matter on suspend? We test on resume if the GPU is usable, but > >>>if we wanted to test on suspend then we should do > >>> > >>>flush_work(); > >>>if (i915_terminally_wedged()) > >>> /* oh noes */; > >>>-Chris > >>> > >> > > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx