Re: [PATCH v2] drm/i915/uc: Start preparing GuC/HuC for reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2/26/2018 10:27 PM, Daniele Ceraolo Spurio wrote:


On 25/02/18 22:17, Sagar Arun Kamble wrote:


On 2/23/2018 10:31 PM, Daniele Ceraolo Spurio wrote:


On 23/02/18 06:04, Michal Wajdeczko wrote:
Right after GPU reset there will be a small window of time during which
some of GuC/HuC fields will still show state before reset. Let's start
to fix that by sanitizing firmware status as we will use it shortly.

v2: s/reset_prepare/prepare_to_reset (Michel)
     don't forget about gem_sanitize path (Daniele)

Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
Cc: Sagar Arun Kamble <sagar.a.kamble@xxxxxxxxx>
Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Michel Thierry <michel.thierry@xxxxxxxxx>
---
  drivers/gpu/drm/i915/i915_gem.c    |  5 ++++-
  drivers/gpu/drm/i915/intel_guc.h   |  5 +++++
  drivers/gpu/drm/i915/intel_huc.h   |  5 +++++
  drivers/gpu/drm/i915/intel_uc.c    | 14 ++++++++++++++
  drivers/gpu/drm/i915/intel_uc.h    |  1 +
  drivers/gpu/drm/i915/intel_uc_fw.h |  6 ++++++
  6 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 14c855b..ae2c4ba 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2981,6 +2981,7 @@ int i915_gem_reset_prepare(struct drm_i915_private *dev_priv)
      }
        i915_gem_revoke_fences(dev_priv);
+    intel_uc_prepare_to_reset(dev_priv);
        return err;
  }
@@ -4881,8 +4882,10 @@ void i915_gem_sanitize(struct drm_i915_private *i915)        * it may impact the display and we are uncertain about the stability
       * of the reset, so this could be applied to even earlier gen.
       */
-    if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
+    if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915)) {
+        intel_uc_prepare_to_reset(i915);

This leaves the status with an incorrect value if we boot with i915.reset=0,
It depends on whether WOPCM is locked (In case of resume from S3 I have seen it to be the case often).
Then we need not reload GuC also unless we are not doing full GPU reset.
but I think this is still the right place to add this in.
Yes
There are several things with GuC that are going to break if we use reset=0 (e.g. doorbell cleanup)
Can you elaborate how it might break.
i915 isn't currently communicating to GuC (destroy_doorbell) during doorbell cleanup and if we start communicating then it should not fail as GuC will be available with reset=0.  Also __intel_uc_reset_hw isn't gated by reset modparam.

As you said we do always reset GuC no matter the value of the modparam, but that does not reset the doorbell HW. If we're coming out of S3 and the state as been preserved this will cause the doorbell HW to be left in an unclean state, which could cause spurious doorbell interrupts to be sent to GuC, not sure how the firmware handles those. The code as moved since last time I looked at this in detail and I think we're now most likely going to overwrite those unclean doorbells, but there are unlikely corner cases (preempt context failing to be created) where we might not do so. More generally, my concern was that in the code flow we assume GuC and related HW to be reset and in need of a re-init when we come out of suspend when actually as you reported that might not be the case if we have reset=0. Even if we have no major concerns now, issues might arise in the future after code reworks or new feature additions if we start from a wrong assumption. Instead of changing the flow to consider the reset=0 (which isn't really a supported scenario) I think it'd be more useful to just enforce the fact that we don't support that use-case with GuC, hence my suggestion. And yes, I'm probably just being uber-paranoid :P

Makes sense ..... Agree on sanitizing with GuC to now allow reset=0
We could also fix this if we could reset doorbell unit alone at resume and acquire needed doorbells but AFAIK earlier guc_init_doorbell_hw is the way to reset all doorbells (that needed GuC). As you said we can skip these changes though since reset=0 isn't supported scenario.
Daniele

so I wouldn't consider this a regression, but we might want to start sanitizing the modparams to not allow reset=0 with GuC.

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>

Daniele

WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
+    }
  }
    int i915_gem_suspend(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
index 52856a9..0f6adb1 100644
--- a/drivers/gpu/drm/i915/intel_guc.h
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -132,4 +132,9 @@ static inline u32 guc_ggtt_offset(struct i915_vma *vma)   struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size);
  u32 intel_guc_wopcm_size(struct drm_i915_private *dev_priv);
  +static inline void intel_guc_prepare_to_reset(struct intel_guc *guc)
+{
+    intel_uc_fw_prepare_to_reset(&guc->fw);
+}
+
  #endif
diff --git a/drivers/gpu/drm/i915/intel_huc.h b/drivers/gpu/drm/i915/intel_huc.h
index 40039db..96e24f9 100644
--- a/drivers/gpu/drm/i915/intel_huc.h
+++ b/drivers/gpu/drm/i915/intel_huc.h
@@ -38,4 +38,9 @@ struct intel_huc {
  int intel_huc_init_hw(struct intel_huc *huc);
  int intel_huc_auth(struct intel_huc *huc);
  +static inline void intel_huc_prepare_to_reset(struct intel_huc *huc)
+{
+    intel_uc_fw_prepare_to_reset(&huc->fw);
+}
+
  #endif
diff --git a/drivers/gpu/drm/i915/intel_uc.c b/drivers/gpu/drm/i915/intel_uc.c
index 9f1bac6..8042d4b 100644
--- a/drivers/gpu/drm/i915/intel_uc.c
+++ b/drivers/gpu/drm/i915/intel_uc.c
@@ -445,3 +445,17 @@ void intel_uc_fini_hw(struct drm_i915_private *dev_priv)
      if (USES_GUC_SUBMISSION(dev_priv))
          gen9_disable_guc_interrupts(dev_priv);
  }
+
+void intel_uc_prepare_to_reset(struct drm_i915_private *i915)
+{
+    struct intel_huc *huc = &i915->huc;
+    struct intel_guc *guc = &i915->guc;
+
+    if (!USES_GUC(i915))
+        return;
+
+    GEM_BUG_ON(!HAS_GUC(i915));
+
+    intel_huc_prepare_to_reset(huc);
+    intel_guc_prepare_to_reset(guc);
+}
diff --git a/drivers/gpu/drm/i915/intel_uc.h b/drivers/gpu/drm/i915/intel_uc.h
index f2984e0..7a8ae58 100644
--- a/drivers/gpu/drm/i915/intel_uc.h
+++ b/drivers/gpu/drm/i915/intel_uc.h
@@ -39,6 +39,7 @@
  void intel_uc_fini_hw(struct drm_i915_private *dev_priv);
  int intel_uc_init(struct drm_i915_private *dev_priv);
  void intel_uc_fini(struct drm_i915_private *dev_priv);
+void intel_uc_prepare_to_reset(struct drm_i915_private *dev_priv);
    static inline bool intel_uc_is_using_guc(void)
  {
diff --git a/drivers/gpu/drm/i915/intel_uc_fw.h b/drivers/gpu/drm/i915/intel_uc_fw.h
index d5fd460..f1ee653 100644
--- a/drivers/gpu/drm/i915/intel_uc_fw.h
+++ b/drivers/gpu/drm/i915/intel_uc_fw.h
@@ -115,6 +115,12 @@ static inline bool intel_uc_fw_is_selected(struct intel_uc_fw *uc_fw)
      return uc_fw->path != NULL;
  }
  +static inline void intel_uc_fw_prepare_to_reset(struct intel_uc_fw *uc_fw)
+{
+    if (uc_fw->load_status == INTEL_UC_FIRMWARE_SUCCESS)
+        uc_fw->load_status = INTEL_UC_FIRMWARE_PENDING;
+}
+
  void intel_uc_fw_fetch(struct drm_i915_private *dev_priv,
                 struct intel_uc_fw *uc_fw);
  int intel_uc_fw_upload(struct intel_uc_fw *uc_fw,



--
Thanks,
Sagar

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux