On 04/10/2023 10:43, Andi Shyti wrote:
The MCR steering semaphore is a shared lock entry between i915
and various firmware components.
Getting the lock might sinchronize on some shared resources.
Sometimes though, it might happen that the firmware forgets to
unlock causing unnecessary noise in the driver which keeps doing
what was supposed to do, ignoring the problem.
Do not consider this failure as an error, but just print a debug
message stating that the MCR locking has been skipped.
On the driver side we still have spinlocks that make sure that
the access to the resources is serialized.
Signed-off-by: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx>
Cc: Jonathan Cavitt <jonathan.cavitt@xxxxxxxxx>
Cc: Matt Roper <matthew.d.roper@xxxxxxxxx>
Cc: Nirmoy Das <nirmoy.das@xxxxxxxxx>
---
drivers/gpu/drm/i915/gt/intel_gt_mcr.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
index 326c2ed1d99b..51eb693df39b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
@@ -395,10 +395,8 @@ void intel_gt_mcr_lock(struct intel_gt *gt, unsigned long *flags)
* would indicate some hardware/firmware is misbehaving and not
* releasing it properly.
*/
- if (err == -ETIMEDOUT) {
- gt_err_ratelimited(gt, "hardware MCR steering semaphore timed out");
- add_taint_for_CI(gt->i915, TAINT_WARN); /* CI is now unreliable */
- }
+ if (err == -ETIMEDOUT)
+ gt_dbg(gt, "hardware MCR steering semaphore timed out");
}
/**
Are we sure this does not warrant a level higher than dbg, such as
notice/warn? Because how can we be sure the two entities will not stomp
on each other toes if we failed to obtain lock? (How can we be sure
about "forgot to unlock" vs "in prolonged active use"? Or if we can be
sure, can we force unlock and take the lock for the driver explicitly?)
Regards,
Tvrtko