Re: [PATCH] drm/i915/pcode: Wait 10 seconds for pcode to settle

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 1/11/23 5:36 PM, Andi Shyti wrote:
On Wed, Jan 11, 2023 at 03:18:38PM +0200, Jani Nikula wrote:
On Wed, 11 Jan 2023, Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx> wrote:
From: Aravind Iddamsetty <aravind.iddamsetty@xxxxxxxxx>

During module load not all the punit transaction have completed
and we might end up timing out, as shown by the following
warning:

Root cause?

Hi Andi, looking at the log where this problem is reported,

https://gitlab.freedesktop.org/drm/intel/-/issues/7814

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_1294/bat-atsm-1/igt@i915_module_load@xxxxxxxxxxxxxxx#dmesg-warnings17324

<4> [268.276908] i915 0000:4d:00.0: drm_WARN_ON_ONCE(timeout_base_ms > 3)

In the code above, the call stack is output, and the return value of intel_pcode_init() returns -11 (-EAGAIN).

<3> [268.329058] i915 0000:4d:00.0: [drm] *ERROR* gt0: intel_pcode_init failed -11


If you simplify the function call flow, you can see it as below. (The call of _wait_for(COND, timeout_base_ms * 1000, 10, 10) in skl_pcode_request() is omitted)

-------------------------------------------------------------------------
intel_pcode_init()
 |
 +-> skl_pcode_request(uncore, DG1_PCODE_STATUS,
                       DG1_UNCORE_GET_INIT_STATUS,
                       DG1_UNCORE_INIT_STATUS_COMPLETE,
                       DG1_UNCORE_INIT_STATUS_COMPLETE, 180000);
       |
       +-> skl_pcode_try_request()
             |
             +->  *status = __snb_pcode_rw(uncore, mbox, &request, NULL,
                                           500, 0, true);

-------------------------------------------------------------------------
static int __snb_pcode_rw(struct intel_uncore *uncore, u32 mbox,
		          u32 *val, u32 *val1,
			  int fast_timeout_us, int slow_timeout_ms,
			  bool is_read)
{
...

	if (intel_uncore_read_fw(uncore, GEN6_PCODE_MAILBOX) & GEN6_PCODE_READY)
		return -EAGAIN;

	intel_uncore_write_fw(uncore, GEN6_PCODE_DATA, *val);
	intel_uncore_write_fw(uncore, GEN6_PCODE_DATA1, val1 ? *val1 : 0);
	intel_uncore_write_fw(uncore,
			      GEN6_PCODE_MAILBOX, GEN6_PCODE_READY | mbox);

	if (__intel_wait_for_register_fw(uncore,
					 GEN6_PCODE_MAILBOX,
					 GEN6_PCODE_READY, 0,
					 fast_timeout_us,
					 slow_timeout_ms,
					 &mbox))
		return -ETIMEDOUT;

...
}
-------------------------------------------------------------------------

The case where skl_pcode_request() returns -EAGAIN can be considered as the following scenario. 1. 1. When checking the GEN6_PCODE_MAILBOX register status before writing the value to the GEN6_PCODE_DATA register in __snb_pcode_rw(), it is always BUSY 2. _wait_for(COND, timeout_base_ms * 1000, 10, 10) of skl_pcode_request() returns -EAGAIN if the GEN6_PCODE_MAILBOX register indicates BUSY even after waiting 500us after writing a value to the GEN6_PCODE_DATA register in __snb_pcode_rw()

(Even if skl_pcode_request() gives a timeout of 180 seconds, the time used each time __snb_pcode_rw() is called is up to 500us. The rest of the time is used for sleep.)

In the situation where the problem is reported, it is not possible to confirm exactly which scenario code causes the problem with the current log information, and it seems that additional analysis is needed to confirm it. If the hardware takes more than 500us to succeed after PCODE_MAILBOX write under certain circumstances, it is thought that the root problem causing the problem should be fixed.

br,
G.G.



    i915 0000:4d:00.0: drm_WARN_ON_ONCE(timeout_base_ms > 3)

Wait 10 seconds for the punit to settle and complete any
outstanding transactions upon module load.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7814


No blank lines between the tag lines please.

I don't consider "Closes:" to be a tag even if someone is using
it as such. AFAIK is not mentioned it in any of the kernel docs
(e.g. Documentation/process/maintainer-tip.rst).

Not to confuse it with a normal tag, I wanted to put it in
parenthesis or rephrase it.

Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@xxxxxxxxx>
Co-developed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx>
Signed-off-by: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx>
---
  drivers/gpu/drm/i915/intel_pcode.c | 35 ++++++++++++++++++++++++++----
  1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_pcode.c b/drivers/gpu/drm/i915/intel_pcode.c
index a234d9b4ed14..3db2ba439bb5 100644
--- a/drivers/gpu/drm/i915/intel_pcode.c
+++ b/drivers/gpu/drm/i915/intel_pcode.c
@@ -204,15 +204,42 @@ int skl_pcode_request(struct intel_uncore *uncore, u32 mbox, u32 request,
  #undef COND
  }
+static int pcode_init_wait(struct intel_uncore *uncore, int timeout_ms)
+{
+	if (__intel_wait_for_register_fw(uncore,
+					 GEN6_PCODE_MAILBOX,
+					 GEN6_PCODE_READY, 0,
+					 500, timeout_ms,
+					 NULL))
+		return -EPROBE_DEFER;

This is already done within skl_pcode_request -> skl_pcode_try_request
-> __snb_pcode_rw path, with waits in skl_pcode_request.

the idea is to check for PCODE_READY even before checking if
data are sent/received by pcode. And this is only during boot
time. While skl_pcode_request is called in other contexts as
well.

In other words here I want to start the communication when I
already know that the punit is ready. Otherwise I would hit an
-EAGAIN and fail.

Is there anyone who still understands what's being waited for, where,
for how long, and why in the different code paths? I know I don't, and
this isn't helping.

I think it depends on hardware. There are some documents roaming
around with some boot time and reset time calculation.

There's also no explanation on the -EPROBE_DEFER return in the commit
message or comments or anywhere.

we haven't really failed, right? We just need some time for the
punit to be ready and try to probe again (remember we are here in
module probe).

Thanks for the review,
Andi

Again, root cause?

BR,
Jani.


+
+	return skl_pcode_request(uncore,
+				 DG1_PCODE_STATUS,
+				 DG1_UNCORE_GET_INIT_STATUS,
+				 DG1_UNCORE_INIT_STATUS_COMPLETE,
+				 DG1_UNCORE_INIT_STATUS_COMPLETE, timeout_ms);
+}
+
  int intel_pcode_init(struct intel_uncore *uncore)
  {
+	int err;
+
  	if (!IS_DGFX(uncore->i915))
  		return 0;
- return skl_pcode_request(uncore, DG1_PCODE_STATUS,
-				 DG1_UNCORE_GET_INIT_STATUS,
-				 DG1_UNCORE_INIT_STATUS_COMPLETE,
-				 DG1_UNCORE_INIT_STATUS_COMPLETE, 180000);
+	/*
+	 * Wait 10 seconds so that the punit to settle and complete
+	 * any outstanding transactions upon module load
+	 */
+	err = pcode_init_wait(uncore, 10000);
+
+	if (err) {
+		drm_notice(&uncore->i915->drm,
+			   "Waiting for HW initialisation...\n");
+		err = pcode_init_wait(uncore, 180000);
+	}
+
+	return err;
  }
int snb_pcode_read_p(struct intel_uncore *uncore, u32 mbcmd, u32 p1, u32 p2, u32 *val)

--
Jani Nikula, Intel Open Source Graphics Center



[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux