Re: [PATCH] drm/i915/selftest: Bump up sample period for busy stats selftest

Umesh Nerlige Ramappa <umesh.nerlige.ramappa@xxxxxxxxx> · Fri, 4 Nov 2022 07:58:43 -0700

On Fri, Nov 04, 2022 at 08:29:38AM +0000, Tvrtko Ursulin wrote:

On 03/11/2022 18:08, Umesh Nerlige Ramappa wrote:
On Thu, Nov 03, 2022 at 12:28:46PM +0000, Tvrtko Ursulin wrote:

On 03/11/2022 00:11, Umesh Nerlige Ramappa wrote:
Engine busyness samples around a 10ms period is failing with busyness
ranging approx. from 87% to 115%. The expected range is +/- 5% of the
sample period.

When determining busyness of active engine, the GuC based engine
busyness implementation relies on a 64 bit timestamp register read. The
latency incurred by this register read causes the failure.

On DG1, when the test fails, the observed latencies range from 900us -
1.5ms.

Do I read this right - that the latency of a 64 bit timestamp 
register read is 0.9 - 1.5ms? That would be the read in 
guc_update_pm_timestamp?

Correct. That is total time taken by intel_uncore_read64_2x32() 
measured with local_clock().

One other thing I missed out in the comments is that enable_dc=0 
also resolves the issue, but display team confirmed there is no 
relation to display in this case other than that it somehow 
introduces a latency in the reg read.

Could it be the DMC wreaking havoc something similar to b68763741aa2 
("drm/i915: Restore GT performance in headless mode with DMC loaded")?


__gt_unpark is already doing a 

gt->awake = intel_display_power_get(i915, POWER_DOMAIN_GT_IRQ);

I would assume that __gt_unpark was called prior to running the 
selftest, need to confirm that though.

One solution tried was to reduce the latency between reg read and
CPU timestamp capture, but such optimization does not add value to user
since the CPU timestamp obtained here is only used for (1) selftest and
(2) i915 rps implementation specific to execlist scheduler. Also, this
solution only reduces the frequency of failure and does not eliminate
it.

Note that this solution is here - 
https://patchwork.freedesktop.org/patch/509991/?series=110497&rev=1

but I am not intending to use it since it just reduces the frequency 
of failues, but the inherent issue still exists.

Right, I'd just go with that as well if it makes a significant 
improvement. Or even just refactor intel_uncore_read64_2x32 to be 
under one spinlock/fw. I don't see that it can have an excuse to be 
less efficient since there's a loop in there.

The patch did reduce the failure to once in 200 runs vs once in 10 runs.  

I will refactor the helper in that case.

Thanks,
Umesh


Regards,

Tvrtko

Regards,
Umesh


In order to make the selftest more robust and account for such
latencies, increase the sample period to 100 ms.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@xxxxxxxxx>
---
 drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c 
b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
index 0dcb3ed44a73..87c94314cf67 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
@@ -317,7 +317,7 @@ static int live_engine_busy_stats(void *arg)
         ENGINE_TRACE(engine, "measuring busy time\n");
         preempt_disable();
         de = intel_engine_get_busy_time(engine, &t[0]);
-        mdelay(10);
+        mdelay(100);
         de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de);
         preempt_enable();
         dt = ktime_sub(t[1], t[0]);