Re: [RFC PATCH] drm/i915/gt: Do not treat MCR locking timeouts as errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/4/2023 07:08, Andi Shyti wrote:
Hi Tvrtko,

The MCR steering semaphore is a shared lock entry between i915
and various firmware components.

Getting the lock might sinchronize on some shared resources.
Sometimes though, it might happen that the firmware forgets to
unlock causing unnecessary noise in the driver which keeps doing
what was supposed to do, ignoring the problem.

Do not consider this failure as an error, but just print a debug
message stating that the MCR locking has been skipped.

On the driver side we still have spinlocks that make sure that
the access to the resources is serialized.

Signed-off-by: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx>
Cc: Jonathan Cavitt <jonathan.cavitt@xxxxxxxxx>
Cc: Matt Roper <matthew.d.roper@xxxxxxxxx>
Cc: Nirmoy Das <nirmoy.das@xxxxxxxxx>
---
   drivers/gpu/drm/i915/gt/intel_gt_mcr.c | 6 ++----
   1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
index 326c2ed1d99b..51eb693df39b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
@@ -395,10 +395,8 @@ void intel_gt_mcr_lock(struct intel_gt *gt, unsigned long *flags)
   	 * would indicate some hardware/firmware is misbehaving and not
   	 * releasing it properly.
   	 */
-	if (err == -ETIMEDOUT) {
-		gt_err_ratelimited(gt, "hardware MCR steering semaphore timed out");
-		add_taint_for_CI(gt->i915, TAINT_WARN);  /* CI is now unreliable */
-	}
+	if (err == -ETIMEDOUT)
+		gt_dbg(gt, "hardware MCR steering semaphore timed out");
   }
   /**
Are we sure this does not warrant a level higher than dbg, such as
notice/warn?
We might make it a warn, but this doesn't change much the economy
of the driver as we will keep doing what we were supposed to do.

Because how can we be sure the two entities will not stomp on
each other toes if we failed to obtain lock?
So far, in all the research I've done, no one looks like using
MCR lock, but yet someone is stuck in it.
If someone has the lock then that someone thinks they are using it. Just because you can't see what someone piece of IFWI is doing doesn't mean it isn't doing it. And if it is a genuinely missing unlock then it needs to be tracked down and fixed with an IFWI update otherwise the system is going to be unstable from that point on.


(How can we be sure about
"forgot to unlock" vs "in prolonged active use"?
There is a patch from Jonathan that is testing a different
timeout.

Or if we can be sure, can
we force unlock and take the lock for the driver explicitly?)
I sent a patch with this solution and Matt turned it down.
Presumably because both forcing a lock and ignoring a failed lock are Bad Things to be doing!

Just because some entity out of our control isn't playing friendly doesn't mean we can ignore the problem. The lock is there for a reason. If someone else owns the lock then any steered access by i915 is potentially going to be routed to the wrong register as the other entity re-directs the steering behind out back. That is why there is an error message being printed. Because things are quite possibly going to fail in some unknown manner.

John.


Andi




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux