On Mon, Nov 07, 2022 at 10:16:20AM +0000, Tvrtko Ursulin wrote:
On 05/11/2022 00:32, Umesh Nerlige Ramappa wrote:
Engine busyness samples around a 10ms period is failing with busyness
ranging approx. from 87% to 115%. The expected range is +/- 5% of the
sample period.
When determining busyness of active engine, the GuC based engine
busyness implementation relies on a 64 bit timestamp register read. The
latency incurred by this register read causes the failure.
On DG1, when the test fails, the observed latencies range from 900us -
1.5ms.
Is it at all faster with the locked 2x32 or still the same unexplained
display related latencies can happen?
Considering that originally this failed 1 in 10 runs,
The locked 2x32 patch in this series reduces failure rate to 1 in 50.
What really helps is - if the CPU timestamp is taken within the
forcewake block, then the correlation between GPU/CPU times is very good
and that reduces the selftest failure frequency (1 in 200). More like
this:
uncore_lock
fw_get
read 64-bit GPU time
read CPU timestamp
fw_put
uncore_unlock.
I recall we had arrived at this sequence in the past when implementing
query_cs_cycles
- https://patchwork.freedesktop.org/patch/432041/?series=89766&rev=1
I still included the locked 2x32 patch here because 1 in 50 is still
better than 1 in 10.
For now, 100 ms sample period is the only promising solution I see. No
failures for 1000 runs.
Thanks,
Umesh
One solution tried was to reduce the latency between reg read and
CPU timestamp capture, but such optimization does not add value to user
since the CPU timestamp obtained here is only used for (1) selftest and
(2) i915 rps implementation specific to execlist scheduler. Also, this
solution only reduces the frequency of failure and does not eliminate
it.
In order to make the selftest more robust and account for such
latencies, increase the sample period to 100 ms.
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@xxxxxxxxx>
---
drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
index 0dcb3ed44a73..87c94314cf67 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
@@ -317,7 +317,7 @@ static int live_engine_busy_stats(void *arg)
ENGINE_TRACE(engine, "measuring busy time\n");
preempt_disable();
de = intel_engine_get_busy_time(engine, &t[0]);
- mdelay(10);
+ mdelay(100);
de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de);
preempt_enable();
dt = ktime_sub(t[1], t[0]);
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Regards,
Tvrtko