Fwd: [PATCH v1] drm/i915/guc: Always disable interrupt ahead of synchronize_irq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just found my previous response click on "reply", not the "reply all", so add Cc list.

Regards,
Zhanjun Dong


-------- Forwarded Message --------
Subject: Re: [PATCH v1] drm/i915/guc: Always disable interrupt ahead of synchronize_irq
Date: Mon, 27 Jan 2025 17:17:33 -0500
From: Dong, Zhanjun <zhanjun.dong@xxxxxxxxx>
To: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>

Please see my comments below.
Daniele raised a good point on GuC interrupt, I will dig deeper for more info.

Regards,
Zhanjun Dong

On 2025-01-27 12:12 p.m., Daniele Ceraolo Spurio wrote:



On 1/23/2025 8:23 AM, Zhanjun Dong wrote:
The purpose of synchronize_irq is to wait for any pending IRQ handlers for the interrupt to complete, if synchronize_irq called before interrupt disabled, an
tiny timing window created, where no more pending IRQ, but interrupt not
disabled yet. Meanwhile, if the interrupt event happened in this timing window,
an unexpected IRQ handling will be triggered.

Fixed by always disable interrupt ahead of synchronize_irq.

Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13454
Fixes: 26705e20752a ("drm/i915: Support for GuC interrupts")
Fixes: 54c52a841250 ("drm/i915/guc: Correctly handle GuC interrupts on Gen11")
Fixes: 2ae096872a2c ("drm/i915/pxp: Implement PXP irq handler")
Fixes: 3e7abf814193 ("drm/i915: Extract GT render power state management")

Signed-off-by: Zhanjun Dong <zhanjun.dong@xxxxxxxxx>

---
Cc: Alan Previn <alan.previn.teres.alexis@xxxxxxxxx>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
Cc: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx>
Cc: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx>
Cc: Andi Shyti <andi.shyti@xxxxxxxxx>
---
  drivers/gpu/drm/i915/gt/intel_rps.c      | 3 +--
  drivers/gpu/drm/i915/gt/uc/intel_guc.c   | 4 ++--
  drivers/gpu/drm/i915/pxp/intel_pxp_irq.c | 2 +-
  3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/ i915/gt/intel_rps.c
index fa304ea088e4..0fe7a8d7f460 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -244,8 +244,8 @@ static void rps_disable_interrupts(struct intel_rps *rps)
      gen6_gt_pm_disable_irq(gt, GEN6_PM_RPS_EVENTS);
      spin_unlock_irq(gt->irq_lock);
+    rps_reset_interrupts(rps);
      intel_synchronize_irq(gt->i915);

I don't think this is an issue, because we set the irq mask in gen6_gt_pm_disable_irq, so there is no chance of getting any new interrupts after that. Not saying that we shouldn't do the re-order, but we don't need a fixes tag for this.
Good point, the interrupt already disabled by gen6_gt_pm_disable_irq, the rps_reset_interrupts is just clear interrupt bits, so no risk here, therefor no issue at all. Anyway, re-order is still a good practice, the interrupt bits set/and clear is all about the interrupt bits, these 2 actions could be consider logically tight relationship, the intel_synchronize_irq considered to be the next step. So fixes tag should be removed. As not an issue fix, this work could be done separately.


-
      /*
       * Now that we will not be generating any more work, flush any
       * outstanding tasks. As we are called on the RPS idle path,
@@ -254,7 +254,6 @@ static void rps_disable_interrupts(struct intel_rps *rps)
       */
      cancel_work_sync(&rps->work);
-    rps_reset_interrupts(rps);
      GT_TRACE(gt, "interrupts:off\n");
  }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/ i915/gt/uc/intel_guc.c
index 5949ff0b0161..3e7b2c6cdca4 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -116,9 +116,9 @@ static void gen9_disable_guc_interrupts(struct intel_guc *guc)
      gen6_gt_pm_disable_irq(gt, gt->pm_guc_events);
      spin_unlock_irq(gt->irq_lock);
-    intel_synchronize_irq(gt->i915);
      gen9_reset_guc_interrupts(guc);
+    intel_synchronize_irq(gt->i915);

Same as above with gen6_gt_pm_disable_irq

  }
  static bool __gen11_reset_guc_interrupts(struct intel_gt *gt)
@@ -154,9 +154,9 @@ static void gen11_disable_guc_interrupts(struct intel_guc *guc)
      struct intel_gt *gt = guc_to_gt(guc);
      guc->interrupts.enabled = false;
-    intel_synchronize_irq(gt->i915);
      gen11_reset_guc_interrupts(guc);
+    intel_synchronize_irq(gt->i915);

No early disabling here, but I don't think this change helps either. AFAICS gen11_reset_guc_interrupts() only calls gen11_gt_reset_one_iir(), which just clears the IIR bits for the GuC. There are no changes in interrupt enable/mask status, so interrupts can still be generated. The way interrupts are stopped for gen11+ is by setting guc- >interrupts.enabled, because that's checked from both guc_irq_handler() and intel_guc_to_host_event_handler(), so any new interrupts generated after we set that should be immediately dropped.
Good point! Because there are no changes in interrupt enable/mask status, guc interrupt won't stops, therefor synchronize irq won't make changes.
Let me dig deeper and get back to you once I have more info.


  }
  static void guc_dead_worker_func(struct work_struct *w)
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c b/drivers/gpu/ drm/i915/pxp/intel_pxp_irq.c
index d81750b9bdda..b82a667e7ac0 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
@@ -101,9 +101,9 @@ void intel_pxp_irq_disable(struct intel_pxp *pxp)
      __pxp_set_interrupts(gt, 0);
      spin_unlock_irq(gt->irq_lock);
-    intel_synchronize_irq(gt->i915);
      pxp_irq_reset(gt);
+    intel_synchronize_irq(gt->i915);

Again not a bug here, __pxp_set_interrupts is doing the interrupts disabling and that is already happening before intel_synchronize_irq().
Same to above rps case, not a bug.

Daniele

      flush_work(&pxp->session_work);
  }





[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux