Re: [PATCH 04/43] drm/i915: Do a synchronous switch-to-kernel-context on idling

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Thu, 7 Mar 2019 17:06:58 +0000





On 07/03/2019 13:29, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2019-03-07 13:07:18)

On 06/03/2019 14:24, Chris Wilson wrote:
+static bool switch_to_kernel_context_sync(struct drm_i915_private *i915)
+{
+     if (i915_gem_switch_to_kernel_context(i915))
+             return false;

Is it worth still trying to idle if this fails? Since the timeout is
short, maybe reset in idle state bring less havoc than not. It can only
fail on memory allocations I think, okay and terminally wedged. In which
case it is still okay.

Terminally wedged is hard wired to return 0 (this, the next patch?) so
that we don't bail during i915_gem_suspend() for this reason.

We do still idle if this fails, as we mark the driver/GPU as wedged.
Perform a GPU reset so that it hopefully isn't shooting kittens any
more, and pull a pillow over our heads.

I didn't find a path which idles before wedging if 
switch_to_kernel_context_sync fails due failing 
i915_gem_switch_to_kernel_context. Where is it?

It is a minor concern don't get me wrong. It is unlikely to fail like 
this. I was simply thinking why not try and wait for the current work to 
finish before suspending in this case. Might be a better experience 
after resume.

That's pretty much our answer in every case, wedge and hope the users go
away.

+
+     if (i915_gem_wait_for_idle(i915,
+                                I915_WAIT_LOCKED |
+                                I915_WAIT_FOR_IDLE_BOOST,
+                                HZ / 10))

You'll presumably change HZ / 10 to that new macro.

Yup.

   static void
   i915_gem_idle_work_handler(struct work_struct *work)
   {
-     struct drm_i915_private *dev_priv =
-             container_of(work, typeof(*dev_priv), gt.idle_work.work);
+     struct drm_i915_private *i915 =
+             container_of(work, typeof(*i915), gt.idle_work.work);
+     typeof(i915->gt) *gt = &i915->gt;

I am really not sure about the typeof idiom in normal C code. :( It
saves a little bit of typing, and a little bit of churn if type name
changes, but just feels weird to use it somewhere and somewhere not.

But then we have to name thing! We're not sold on gt; it means quite a
few different things around the bspec. This bit is actually part that
I'm earmarking for i915_gem itself (high level idle/power/user management)
and I'm contemplating i915_gpu for the bits that beneath us but still a
management layer over hardware (and intel_foo for the bits that talk to
hardware. Maybe that too will change if we completely split out into
different modules.)

So you could have left it as is for now and have a smaller diff. But 
okay.. have it if you insist.

  
       /*
-      * New request retired after this work handler started, extend active
-      * period until next instance of the work.
+      * Flush out the last user context, leaving only the pinned
+      * kernel context resident. Should anything unfortunate happen
+      * while we are idle (such as the GPU being power cycled), no users
+      * will be harmed.
        */
-     if (new_requests_since_last_retire(dev_priv))
-             goto out_unlock;
-
-     __i915_gem_park(dev_priv);
+     if (!gt->active_requests && !work_pending(&gt->idle_work.work)) {

Could have left the new_requests_since_last_retire for readability.

As always, it changed several times. I actually think considering that
we play games with active_requests, keeping that in the open is
important. ~o~

Ok.


+             ++gt->active_requests; /* don't requeue idle */
+
+             if (!switch_to_kernel_context_sync(i915)) {
+                     dev_err(i915->drm.dev,
+                             "Failed to idle engines, declaring wedged!\n");
+                     GEM_TRACE_DUMP();
+                     i915_gem_set_wedged(i915);
+             }
+             i915_retire_requests(i915);

@@ -3128,7 +3120,6 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915,
                       return err;
   
               i915_retire_requests(i915);
-             GEM_BUG_ON(i915->gt.active_requests);

Belongs to this patch?

Yes. See the above chunk.

The idle work handler? More hand holding required I'm afraid.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx