On Fri, Sep 01, 2017 at 11:02:11AM +0530, Sagar Arun Kamble wrote: > Teardown of GuC HW/SW state was not properly done in unload path. > During unload, we can rely on intel_guc_reset_prepare being done > as part of i915_gem_suspend for disabling GuC interfaces. > We will have to disable GuC submission prior to suspend as that involves > communication with GuC to destroy doorbell. So intel_uc_fini_hw has to > be called as part of i915_gem_suspend during unload as that really > takes care of finishing the GuC operations. Created new parameter for > i915_gem_suspend to handle unload/suspend path w.r.t gem and GuC suspend. > GuC related allocations are cleaned up as part of intel_uc_cleanup_hw. Is this still accurate description? After changes from v2? > > v2: Prepared i915_gem_unload. (Michal) > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx> > Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@xxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_drv.c | 6 +-- > drivers/gpu/drm/i915/i915_drv.h | 1 + > drivers/gpu/drm/i915/i915_gem.c | 79 ++++++++++++++++++++++++++++++++++ > drivers/gpu/drm/i915/intel_guc.c | 13 ++++++ > drivers/gpu/drm/i915/intel_guc.h | 1 + > drivers/gpu/drm/i915/intel_uc.c | 14 +++--- > drivers/gpu/drm/i915/intel_uc_common.h | 1 + > 7 files changed, 103 insertions(+), 12 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > index b2e8f95..b6cc2fe 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -601,7 +601,7 @@ static void i915_gem_fini(struct drm_i915_private *dev_priv) > i915_gem_drain_workqueue(dev_priv); > > mutex_lock(&dev_priv->drm.struct_mutex); > - intel_uc_fini_hw(dev_priv); > + intel_uc_cleanup_hw(dev_priv); > i915_gem_cleanup_engines(dev_priv); > i915_gem_contexts_fini(dev_priv); > i915_gem_cleanup_userptr(dev_priv); > @@ -682,7 +682,7 @@ static int i915_load_modeset_init(struct drm_device *dev) > return 0; > > cleanup_gem: > - if (i915_gem_suspend(dev_priv)) > + if (i915_gem_unload(dev_priv)) > DRM_ERROR("failed to idle hardware; continuing to unload!\n"); > i915_gem_fini(dev_priv); > cleanup_uc: > @@ -1375,7 +1375,7 @@ void i915_driver_unload(struct drm_device *dev) > > i915_driver_unregister(dev_priv); > > - if (i915_gem_suspend(dev_priv)) > + if (i915_gem_unload(dev_priv)) > DRM_ERROR("failed to idle hardware; continuing to unload!\n"); > > intel_display_power_get(dev_priv, POWER_DOMAIN_INIT); > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 064bf0f..570e71e 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -3628,6 +3628,7 @@ void i915_gem_reset_engine(struct intel_engine_cs *engine, > int i915_gem_wait_for_idle(struct drm_i915_private *dev_priv, > unsigned int flags); > int __must_check i915_gem_suspend(struct drm_i915_private *dev_priv); > +int __must_check i915_gem_unload(struct drm_i915_private *dev_priv); > void i915_gem_resume(struct drm_i915_private *dev_priv); > int i915_gem_fault(struct vm_fault *vmf); > int i915_gem_object_wait(struct drm_i915_gem_object *obj, > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 977500f..24cefe9 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -4624,6 +4624,85 @@ int i915_gem_suspend(struct drm_i915_private *dev_priv) > return ret; > } > > +int i915_gem_unload(struct drm_i915_private *dev_priv) > +{ > + struct drm_device *dev = &dev_priv->drm; > + int ret; > + > + intel_runtime_pm_get(dev_priv); > + intel_suspend_gt_powersave(dev_priv); > + > + mutex_lock(&dev->struct_mutex); > + > + /* We have to flush all the executing contexts to main memory so > + * that they can saved in the hibernation image. To ensure the last > + * context image is coherent, we have to switch away from it. That > + * leaves the dev_priv->kernel_context still active when > + * we actually suspend, and its image in memory may not match the GPU > + * state. Fortunately, the kernel_context is disposable and we do > + * not rely on its state. > + */ > + ret = i915_gem_switch_to_kernel_context(dev_priv); > + if (ret) > + goto err_unlock; > + > + ret = i915_gem_wait_for_idle(dev_priv, > + I915_WAIT_INTERRUPTIBLE | > + I915_WAIT_LOCKED); > + if (ret) > + goto err_unlock; > + > + assert_kernel_context_is_current(dev_priv); > + i915_gem_contexts_lost(dev_priv); > + mutex_unlock(&dev->struct_mutex); > + > + cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work); > + cancel_delayed_work_sync(&dev_priv->gt.retire_work); > + > + /* As the idle_work is rearming if it detects a race, play safe and > + * repeat the flush until it is definitely idle. > + */ > + while (flush_delayed_work(&dev_priv->gt.idle_work)) > + ; > + > + /* Assert that we successfully flushed all the work and > + * reset the GPU back to its idle, low power state. > + */ > + WARN_ON(dev_priv->gt.awake); > + WARN_ON(!intel_engines_are_idle(dev_priv)); > + > + /* Handle GuC suspension in case of unloading/system suspend */ > + intel_uc_fini_hw(dev_priv); > + > + /* > + * Neither the BIOS, ourselves or any other kernel > + * expects the system to be in execlists mode on startup, > + * so we need to reset the GPU back to legacy mode. And the only > + * known way to disable logical contexts is through a GPU reset. > + * > + * So in order to leave the system in a known default configuration, > + * always reset the GPU upon unload and suspend. Afterwards we then > + * clean up the GEM state tracking, flushing off the requests and > + * leaving the system in a known idle state. > + * > + * Note that is of the upmost importance that the GPU is idle and > + * all stray writes are flushed *before* we dismantle the backing > + * storage for the pinned objects. > + * > + * However, since we are uncertain that resetting the GPU on older > + * machines is a good idea, we don't - just in case it leaves the > + * machine in an unusable condition. > + */ > + i915_gem_sanitize(dev_priv); > + goto out_rpm_put; > + > +err_unlock: > + mutex_unlock(&dev->struct_mutex); > +out_rpm_put: > + intel_runtime_pm_put(dev_priv); > + return ret; > +} > + ^^^ This is just copypasta of i915_gem_suspend One line has changed... (well, I think it's no longer up-to-date with drm-tip, so there's more, but s/intel_guc_system_suspend/intel_uc_fini_hw was the intention, right?) We can do better. > void i915_gem_resume(struct drm_i915_private *dev_priv) > { > struct drm_device *dev = &dev_priv->drm; > diff --git a/drivers/gpu/drm/i915/intel_guc.c b/drivers/gpu/drm/i915/intel_guc.c > index 1fd8599..da81c37 100644 > --- a/drivers/gpu/drm/i915/intel_guc.c > +++ b/drivers/gpu/drm/i915/intel_guc.c > @@ -343,3 +343,16 @@ int intel_guc_system_resume(struct intel_guc *guc) > */ > return ret; > } > + > +void intel_guc_fini(struct intel_guc *guc) > +{ > + struct drm_i915_private *dev_priv = guc_to_i915(guc); > + struct drm_device *dev = &dev_priv->drm; > + > + if (i915.enable_guc_submission) { > + mutex_lock(&dev->struct_mutex); > + i915_guc_submission_disable(dev_priv); > + mutex_unlock(&dev->struct_mutex); > + intel_guc_reset_prepare(guc); > + } > +} > diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h > index bf4dda0..1b3305f 100644 > --- a/drivers/gpu/drm/i915/intel_guc.h > +++ b/drivers/gpu/drm/i915/intel_guc.h > @@ -169,6 +169,7 @@ static inline u32 guc_ggtt_offset(struct i915_vma *vma) > int intel_guc_runtime_resume(struct intel_guc *guc); > int intel_guc_system_suspend(struct intel_guc *guc); > int intel_guc_system_resume(struct intel_guc *guc); > +void intel_guc_fini(struct intel_guc *guc); > > static inline int intel_guc_send(struct intel_guc *guc, const u32 *action, > u32 len) > diff --git a/drivers/gpu/drm/i915/intel_uc.c b/drivers/gpu/drm/i915/intel_uc.c > index 30c004c..9216c73 100644 > --- a/drivers/gpu/drm/i915/intel_uc.c > +++ b/drivers/gpu/drm/i915/intel_uc.c > @@ -381,20 +381,16 @@ int intel_uc_init_hw(struct drm_i915_private *dev_priv) > > void intel_uc_fini_hw(struct drm_i915_private *dev_priv) > { > + intel_guc_fini(&dev_priv->guc); > +} > + > +void intel_uc_cleanup_hw(struct drm_i915_private *dev_priv) > +{ > guc_free_load_err_log(&dev_priv->guc); Not really part of the patch... But still - why are we touching this here? Isn't guc->log->vma is created only if GuC is loaded? > > if (!i915.enable_guc_loading) > return; > > if (i915.enable_guc_submission) > - i915_guc_submission_disable(dev_priv); > - > - intel_guc_disable_communication(&dev_priv->guc); > - > - if (i915.enable_guc_submission) { > - gen9_disable_guc_interrupts(dev_priv); > i915_guc_submission_fini(dev_priv); > - } > - > - i915_ggtt_disable_guc(dev_priv); > } We now have intel_uc_fini_hw and intel_uc_cleanup_hw. intel_uc_fini_hw is called by i915_gem_unload, intel_uc_cleanup_hw is called by i915_gem_fini - so, it's in that order, we're first calling fini, then calling cleanup. Both are only used on unload path, while intel_uc_init_hw, is called on load, resume, and reset (by i915_gem_init_hw) We should probably redo the naming at this point... Perhaps stick with the unload for rather than fini_hw? Or at least comment on the ordering. -Michał > diff --git a/drivers/gpu/drm/i915/intel_uc_common.h b/drivers/gpu/drm/i915/intel_uc_common.h > index f454f73..c6e873d 100644 > --- a/drivers/gpu/drm/i915/intel_uc_common.h > +++ b/drivers/gpu/drm/i915/intel_uc_common.h > @@ -97,5 +97,6 @@ struct intel_uc_fw { > void intel_uc_fini_fw(struct drm_i915_private *dev_priv); > int intel_uc_init_hw(struct drm_i915_private *dev_priv); > void intel_uc_fini_hw(struct drm_i915_private *dev_priv); > +void intel_uc_cleanup_hw(struct drm_i915_private *dev_priv); > > #endif > -- > 1.9.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/intel-gfx _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx