Just found my previous response click on "reply", not the "reply all",
so add Cc list.
Regards,
Zhanjun Dong
-------- Forwarded Message --------
Subject: Re: [PATCH v1] drm/i915/guc: Always disable interrupt ahead of
synchronize_irq
Date: Mon, 27 Jan 2025 17:17:33 -0500
From: Dong, Zhanjun <zhanjun.dong@xxxxxxxxx>
To: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
Please see my comments below.
Daniele raised a good point on GuC interrupt, I will dig deeper for more
info.
Regards,
Zhanjun Dong
On 2025-01-27 12:12 p.m., Daniele Ceraolo Spurio wrote:
On 1/23/2025 8:23 AM, Zhanjun Dong wrote:
The purpose of synchronize_irq is to wait for any pending IRQ handlers
for the
interrupt to complete, if synchronize_irq called before interrupt
disabled, an
tiny timing window created, where no more pending IRQ, but interrupt not
disabled yet. Meanwhile, if the interrupt event happened in this
timing window,
an unexpected IRQ handling will be triggered.
Fixed by always disable interrupt ahead of synchronize_irq.
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13454
Fixes: 26705e20752a ("drm/i915: Support for GuC interrupts")
Fixes: 54c52a841250 ("drm/i915/guc: Correctly handle GuC interrupts on
Gen11")
Fixes: 2ae096872a2c ("drm/i915/pxp: Implement PXP irq handler")
Fixes: 3e7abf814193 ("drm/i915: Extract GT render power state
management")
Signed-off-by: Zhanjun Dong <zhanjun.dong@xxxxxxxxx>
---
Cc: Alan Previn <alan.previn.teres.alexis@xxxxxxxxx>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
Cc: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx>
Cc: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx>
Cc: Andi Shyti <andi.shyti@xxxxxxxxx>
---
drivers/gpu/drm/i915/gt/intel_rps.c | 3 +--
drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 ++--
drivers/gpu/drm/i915/pxp/intel_pxp_irq.c | 2 +-
3 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/
i915/gt/intel_rps.c
index fa304ea088e4..0fe7a8d7f460 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -244,8 +244,8 @@ static void rps_disable_interrupts(struct
intel_rps *rps)
gen6_gt_pm_disable_irq(gt, GEN6_PM_RPS_EVENTS);
spin_unlock_irq(gt->irq_lock);
+ rps_reset_interrupts(rps);
intel_synchronize_irq(gt->i915);
I don't think this is an issue, because we set the irq mask in
gen6_gt_pm_disable_irq, so there is no chance of getting any new
interrupts after that. Not saying that we shouldn't do the re-order, but
we don't need a fixes tag for this.
Good point, the interrupt already disabled by gen6_gt_pm_disable_irq,
the rps_reset_interrupts is just clear interrupt bits, so no risk here,
therefor no issue at all.
Anyway, re-order is still a good practice, the interrupt bits set/and
clear is all about the interrupt bits, these 2 actions could be consider
logically tight relationship, the intel_synchronize_irq considered to be
the next step.
So fixes tag should be removed. As not an issue fix, this work could be
done separately.
-
/*
* Now that we will not be generating any more work, flush any
* outstanding tasks. As we are called on the RPS idle path,
@@ -254,7 +254,6 @@ static void rps_disable_interrupts(struct
intel_rps *rps)
*/
cancel_work_sync(&rps->work);
- rps_reset_interrupts(rps);
GT_TRACE(gt, "interrupts:off\n");
}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/
i915/gt/uc/intel_guc.c
index 5949ff0b0161..3e7b2c6cdca4 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -116,9 +116,9 @@ static void gen9_disable_guc_interrupts(struct
intel_guc *guc)
gen6_gt_pm_disable_irq(gt, gt->pm_guc_events);
spin_unlock_irq(gt->irq_lock);
- intel_synchronize_irq(gt->i915);
gen9_reset_guc_interrupts(guc);
+ intel_synchronize_irq(gt->i915);
Same as above with gen6_gt_pm_disable_irq
}
static bool __gen11_reset_guc_interrupts(struct intel_gt *gt)
@@ -154,9 +154,9 @@ static void gen11_disable_guc_interrupts(struct
intel_guc *guc)
struct intel_gt *gt = guc_to_gt(guc);
guc->interrupts.enabled = false;
- intel_synchronize_irq(gt->i915);
gen11_reset_guc_interrupts(guc);
+ intel_synchronize_irq(gt->i915);
No early disabling here, but I don't think this change helps either.
AFAICS gen11_reset_guc_interrupts() only calls gen11_gt_reset_one_iir(),
which just clears the IIR bits for the GuC. There are no changes in
interrupt enable/mask status, so interrupts can still be generated. The
way interrupts are stopped for gen11+ is by setting guc-
>interrupts.enabled, because that's checked from both guc_irq_handler()
and intel_guc_to_host_event_handler(), so any new interrupts generated
after we set that should be immediately dropped.
Good point! Because there are no changes in interrupt enable/mask
status, guc interrupt won't stops, therefor synchronize irq won't make
changes.
Let me dig deeper and get back to you once I have more info.
}
static void guc_dead_worker_func(struct work_struct *w)
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c b/drivers/gpu/
drm/i915/pxp/intel_pxp_irq.c
index d81750b9bdda..b82a667e7ac0 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
@@ -101,9 +101,9 @@ void intel_pxp_irq_disable(struct intel_pxp *pxp)
__pxp_set_interrupts(gt, 0);
spin_unlock_irq(gt->irq_lock);
- intel_synchronize_irq(gt->i915);
pxp_irq_reset(gt);
+ intel_synchronize_irq(gt->i915);
Again not a bug here, __pxp_set_interrupts is doing the interrupts
disabling and that is already happening before intel_synchronize_irq().
Same to above rps case, not a bug.
Daniele
flush_work(&pxp->session_work);
}