Quoting Daniele Ceraolo Spurio (2019-02-21 21:31:45) > > > On 2/21/19 1:17 PM, Chris Wilson wrote: > > Quoting Daniele Ceraolo Spurio (2019-02-21 19:48:01) > >> > >> <snip> > >> > >>> @@ -4481,19 +4471,7 @@ int i915_gem_suspend(struct drm_i915_private *i915) > >>> * state. Fortunately, the kernel_context is disposable and we do > >>> * not rely on its state. > >>> */ > >>> - if (!i915_terminally_wedged(&i915->gpu_error)) { > >>> - ret = i915_gem_switch_to_kernel_context(i915); > >>> - if (ret) > >>> - goto err_unlock; > >>> - > >>> - ret = i915_gem_wait_for_idle(i915, > >>> - I915_WAIT_INTERRUPTIBLE | > >>> - I915_WAIT_LOCKED | > >>> - I915_WAIT_FOR_IDLE_BOOST, > >>> - HZ / 5); > >>> - if (ret == -EINTR) > >>> - goto err_unlock; > >>> - > >>> + if (!switch_to_kernel_context_sync(i915)) { > /* Forcibly cancel outstanding work and leave the gpu quiet. */ > >>> i915_gem_set_wedged(i915); > >>> } > >> > >> GuC-related question: what's your expectation here in regards to GuC > >> status? The current i915 flow expect either uc_reset_prepare() or > >> uc_suspend() to be called to clean up the guc status, but we're calling > >> neither of them here if the switch is successful. Do you expect the > >> resume code to always blank out the GuC status before a reload? > > > > (A few patches later on I propose that we always just do a reset+wedge > > on suspend in lieu of hangcheck.) > > > > On resume, we have to bring the HW up from scratch and do another reset > > in the process. Some platforms have been known to survive the trips to > > PCI_D3 (someone is lying!) and so we _have_ to do a reset to be sure we > > clear the HW state. I expect we would need to force a reset on resume > > even for the guc, to be sure we cover all cases such as kexec. > > -Chris > > > More than about the HW state, my question here was about the SW > tracking. At which point do we go and stop guc communication and mark > guc as not loaded/accessible? e.g. we need to disable and re-enable CT > buffers before GuC is reset/suspended to make sure the shared memory > area is cleaned correctly (we currently avoid memsetting all of it on > reload since it is quite big). Also, communication with GuC is going to > increase going forward, so we'll need to make sure we accurately track > its state and do all the relevant cleanups. Across suspend/resume, we issue a couple of resets and scrub/sanitize our state tracking. By the time we load the fw again, both the fw and our state should be starting from scratch. That all seems unavoidable, so I am not understanding the essence of your question. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx