Quoting Tvrtko Ursulin (2019-03-07 13:07:18) > > On 06/03/2019 14:24, Chris Wilson wrote: > > +static bool switch_to_kernel_context_sync(struct drm_i915_private *i915) > > +{ > > + if (i915_gem_switch_to_kernel_context(i915)) > > + return false; > > Is it worth still trying to idle if this fails? Since the timeout is > short, maybe reset in idle state bring less havoc than not. It can only > fail on memory allocations I think, okay and terminally wedged. In which > case it is still okay. Terminally wedged is hard wired to return 0 (this, the next patch?) so that we don't bail during i915_gem_suspend() for this reason. We do still idle if this fails, as we mark the driver/GPU as wedged. Perform a GPU reset so that it hopefully isn't shooting kittens any more, and pull a pillow over our heads. That's pretty much our answer in every case, wedge and hope the users go away. > > + > > + if (i915_gem_wait_for_idle(i915, > > + I915_WAIT_LOCKED | > > + I915_WAIT_FOR_IDLE_BOOST, > > + HZ / 10)) > > You'll presumably change HZ / 10 to that new macro. Yup. > > static void > > i915_gem_idle_work_handler(struct work_struct *work) > > { > > - struct drm_i915_private *dev_priv = > > - container_of(work, typeof(*dev_priv), gt.idle_work.work); > > + struct drm_i915_private *i915 = > > + container_of(work, typeof(*i915), gt.idle_work.work); > > + typeof(i915->gt) *gt = &i915->gt; > > I am really not sure about the typeof idiom in normal C code. :( It > saves a little bit of typing, and a little bit of churn if type name > changes, but just feels weird to use it somewhere and somewhere not. But then we have to name thing! We're not sold on gt; it means quite a few different things around the bspec. This bit is actually part that I'm earmarking for i915_gem itself (high level idle/power/user management) and I'm contemplating i915_gpu for the bits that beneath us but still a management layer over hardware (and intel_foo for the bits that talk to hardware. Maybe that too will change if we completely split out into different modules.) > > /* > > - * New request retired after this work handler started, extend active > > - * period until next instance of the work. > > + * Flush out the last user context, leaving only the pinned > > + * kernel context resident. Should anything unfortunate happen > > + * while we are idle (such as the GPU being power cycled), no users > > + * will be harmed. > > */ > > - if (new_requests_since_last_retire(dev_priv)) > > - goto out_unlock; > > - > > - __i915_gem_park(dev_priv); > > + if (!gt->active_requests && !work_pending(>->idle_work.work)) { > > Could have left the new_requests_since_last_retire for readability. As always, it changed several times. I actually think considering that we play games with active_requests, keeping that in the open is important. ~o~ > > > + ++gt->active_requests; /* don't requeue idle */ > > + > > + if (!switch_to_kernel_context_sync(i915)) { > > + dev_err(i915->drm.dev, > > + "Failed to idle engines, declaring wedged!\n"); > > + GEM_TRACE_DUMP(); > > + i915_gem_set_wedged(i915); > > + } > > + i915_retire_requests(i915); > > @@ -3128,7 +3120,6 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915, > > return err; > > > > i915_retire_requests(i915); > > - GEM_BUG_ON(i915->gt.active_requests); > > Belongs to this patch? Yes. See the above chunk. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx