Re: [PATCH 24/46] drm/i915: Do a synchronous switch-to-kernel-context on idling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2/21/19 1:17 PM, Chris Wilson wrote:
Quoting Daniele Ceraolo Spurio (2019-02-21 19:48:01)

<snip>

@@ -4481,19 +4471,7 @@ int i915_gem_suspend(struct drm_i915_private *i915)
        * state. Fortunately, the kernel_context is disposable and we do
        * not rely on its state.
        */
-     if (!i915_terminally_wedged(&i915->gpu_error)) {
-             ret = i915_gem_switch_to_kernel_context(i915);
-             if (ret)
-                     goto err_unlock;
-
-             ret = i915_gem_wait_for_idle(i915,
-                                          I915_WAIT_INTERRUPTIBLE |
-                                          I915_WAIT_LOCKED |
-                                          I915_WAIT_FOR_IDLE_BOOST,
-                                          HZ / 5);
-             if (ret == -EINTR)
-                     goto err_unlock;
-
+     if (!switch_to_kernel_context_sync(i915)) { >                   /* Forcibly cancel outstanding work and leave the gpu quiet. */
               i915_gem_set_wedged(i915);
       }

GuC-related question: what's your expectation here in regards to GuC
status? The current i915 flow expect either uc_reset_prepare() or
uc_suspend() to be called to clean up the guc status, but we're calling
neither of them here if the switch is successful. Do you expect the
resume code to always blank out the GuC status before a reload?

(A few patches later on I propose that we always just do a reset+wedge
on suspend in lieu of hangcheck.)

On resume, we have to bring the HW up from scratch and do another reset
in the process. Some platforms have been known to survive the trips to
PCI_D3 (someone is lying!) and so we _have_ to do a reset to be sure we
clear the HW state. I expect we would need to force a reset on resume
even for the guc, to be sure we cover all cases such as kexec.
-Chris

More than about the HW state, my question here was about the SW tracking. At which point do we go and stop guc communication and mark guc as not loaded/accessible? e.g. we need to disable and re-enable CT buffers before GuC is reset/suspended to make sure the shared memory area is cleaned correctly (we currently avoid memsetting all of it on reload since it is quite big). Also, communication with GuC is going to increase going forward, so we'll need to make sure we accurately track its state and do all the relevant cleanups.

Daniele
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux