Re: [PATCH] drm/i915/gsc: Fix the Driver-FLR completion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2/22/2023 1:01 PM, Alan Previn wrote:
The Driver-FLR flow may inadvertently exit early before the full
completion of the re-init of the internal HW state if we only poll
GU_DEBUG Bit31 (polling for it to toggle from 0 -> 1). Instead
we need a two-step completion wait-for-completion flow that also
involves GU_CNTL. See the patch and new code comments for detail.
This is new direction from HW architecture folks.

    v2: - Add error message for the teardown timeout (Anshuman)
        - Don't duplicate code in comments (Jani)

Signed-off-by: Alan Previn <alan.previn.teres.alexis@xxxxxxxxx>
Fixes: 5a44fcd73498 ("drm/i915/gsc: Do a driver-FLR on unload if GSC was loaded")

I'm not sure if we need a fixes tag, given that this is MTL specific code and that's still under force probe.

---
  drivers/gpu/drm/i915/intel_uncore.c | 13 ++++++++++++-
  1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index f018da7ebaac..f3c46352db89 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -2749,14 +2749,25 @@ static void driver_initiated_flr(struct intel_uncore *uncore)
  	/* Trigger the actual Driver-FLR */
  	intel_uncore_rmw_fw(uncore, GU_CNTL, 0, DRIVERFLR);
+ /* Wait for hardware teardown to complete */
+	ret = intel_wait_for_register_fw(uncore, GU_CNTL,
+					 DRIVERFLR_STATUS, 0,

shouldn't this bit be DRIVERFLR instead of DRIVERFLR_STATUS ? I know they're both BIT(31), but DRIVERFLR_STATUS is the definition for the GU_DEBUG bit, while this wait is on GU_CNTL.

+					 flr_timeout_ms);
+	if (ret) {
+		drm_err(&i915->drm, "Driver-FLR-teardown wait completion failed! %d\n", ret);
+		return;
+	}
+
+	/* Wait for hardware/firmware re-init to complete */
  	ret = intel_wait_for_register_fw(uncore, GU_DEBUG,
  					 DRIVERFLR_STATUS, DRIVERFLR_STATUS,
  					 flr_timeout_ms);

I was wondering if we could reduce the timing here to avoid 2 waits of 3 seconds, as the 3 seconds should be for the full process. However, the specs don't say how much each step can take, so I agree that to be safe is better to have both timeouts at 3 seconds. If the FLR fails the HW is toast anyway, so waiting a few seconds more to detect it on driver unload is not going to have additional consequences that we wouldn't already have.

With the bit in the wait above fixed:
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>

Daniele

  	if (ret) {
-		drm_err(&i915->drm, "wait for Driver-FLR completion failed! %d\n", ret);
+		drm_err(&i915->drm, "Driver-FLR-reinit wait completion failed! %d\n", ret);
  		return;
  	}
+ /* Clear sticky completion status */
  	intel_uncore_write_fw(uncore, GU_DEBUG, DRIVERFLR_STATUS);
  }




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux