Re: [PATCH] drm/xe/client: Better correlate exec_queue and GT timestamps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 12, 2024 at 09:34:32AM -0800, Lucas De Marchi wrote:
This partially reverts commit fe4f5d4b6616 ("drm/xe: Clean up VM / exec
queue file lock usage."). While it's desired to have the mutex to
protect only the reference to the exec queue, getting and dropping each
mutex and then later getting the GPU timestamp, doesn't produce a
correct result: it introduces multiple opportunities for the task to be
scheduled out and thus wrecking havoc the deltas reported to userspace.

Also, to better correlate the timestamp from the exec queues with the
GPU, disable preemption so they can be updated without allowing the task
to be scheduled out. We leave interrupts enabled as that shouldn't be
enough disturbance for the deltas to matter to userspace.

Test scenario:

	* IGT'S `xe_drm_fdinfo --r --r utilization-single-full-load`
	* Platform: LNL, where CI occasionally reports failures
	* `stress -c $(nproc)` running in parallel to disturb the
	  system

This brings a first failure from "after ~150 executions" to "never
occurs after 1000 attempts".


so it fails after 2704 runs, but it seems like it's for a reason other than
what's fixed here that was previously failing at ~150 runs. See below.
Relevant lines from the log are marked

	####################################### 2705/10000
	IGT-Version: 1.29-g19f996f8b (x86_64) (Linux: 6.13.0-rc2-xe+ x86_64)
	Using IGT_SRANDOM=1734033293 for randomisation
	Opened device: /dev/dri/card0
	Starting subtest: utilization-single-full-load
	(xe_drm_fdinfo:40147) CRITICAL: Test assertion failure function check_results, file ../tests/intel/xe_drm_fdinfo.c:528:
	(xe_drm_fdinfo:40147) CRITICAL: Failed assertion: percent < 105.0
	(xe_drm_fdinfo:40147) CRITICAL: error: 105.811345 >= 105.000000
	Stack trace:
	  #0 ../lib/igt_core.c:2051 __igt_fail_assert()
	  #1 ../tests/intel/xe_drm_fdinfo.c:520 check_results()
	  #2 ../tests/intel/xe_drm_fdinfo.c:464 utilization_single()
	  #3 ../tests/intel/xe_drm_fdinfo.c:852 __igt_unique____real_main806()
	  #4 ../tests/intel/xe_drm_fdinfo.c:806 main()
	  #5 ../sysdeps/nptl/libc_start_call_main.h:74 __libc_start_call_main()
	  #6 ../csu/libc-start.c:128 __libc_start_main@@GLIBC_2.34()
	  #7 [_start+0x25]
	Subtest utilization-single-full-load failed.
	**** DEBUG ****
	(xe_drm_fdinfo:40147) DEBUG: rcs: spinner started
	(xe_drm_fdinfo:40147) DEBUG: rcs: spinner ended (timestamp=19184279)
	(xe_drm_fdinfo:40147) DEBUG: rcs: sample 1: cycles 0, total_cycles 325162895781
	(xe_drm_fdinfo:40147) DEBUG: rcs: sample 2: cycles 19184523, total_cycles 325182184120
	(xe_drm_fdinfo:40147) DEBUG: rcs: percent: 99.461763
	(xe_drm_fdinfo:40147) DEBUG: bcs: spinner started
	(xe_drm_fdinfo:40147) DEBUG: bcs: spinner ended (timestamp=19193059)
	(xe_drm_fdinfo:40147) DEBUG: bcs: sample 1: cycles 0, total_cycles 325182330134
	(xe_drm_fdinfo:40147) DEBUG: bcs: sample 2: cycles 19193168, total_cycles 325201552996
	(xe_drm_fdinfo:40147) DEBUG: bcs: percent: 99.845522
	(xe_drm_fdinfo:40147) DEBUG: ccs: spinner started
	(xe_drm_fdinfo:40147) DEBUG: ccs: spinner ended (timestamp=19200849)
	(xe_drm_fdinfo:40147) DEBUG: ccs: sample 1: cycles 0, total_cycles 325201742269
	(xe_drm_fdinfo:40147) DEBUG: ccs: sample 2: cycles 19201013, total_cycles 325220974694
	(xe_drm_fdinfo:40147) DEBUG: ccs: percent: 99.836666
	(xe_drm_fdinfo:40147) DEBUG: vcs: spinner started
	(xe_drm_fdinfo:40147) DEBUG: vcs: spinner ended (timestamp=19281420)
	(xe_drm_fdinfo:40147) DEBUG: vcs: sample 1: cycles 0, total_cycles 325221246644
	(xe_drm_fdinfo:40147) DEBUG: vcs: sample 2: cycles 19281506, total_cycles 325240467813
	(xe_drm_fdinfo:40147) DEBUG: vcs: percent: 100.313904
	(xe_drm_fdinfo:40147) DEBUG: vecs: spinner started
[0]--->	(xe_drm_fdinfo:40147) DEBUG: vecs: spinner ended (timestamp=20348520)
[1]--->	(xe_drm_fdinfo:40147) DEBUG: vecs: sample 1: cycles 0, total_cycles 325242482039
[2]--->	(xe_drm_fdinfo:40147) DEBUG: vecs: sample 2: cycles 20348601, total_cycles 325261713058
	(xe_drm_fdinfo:40147) DEBUG: vecs: percent: 105.811345
	(xe_drm_fdinfo:40147) CRITICAL: Test assertion failure function check_results, file ../tests/intel/xe_drm_fdinfo.c:528:
	(xe_drm_fdinfo:40147) CRITICAL: Failed assertion: percent < 105.0
	(xe_drm_fdinfo:40147) CRITICAL: error: 105.811345 >= 105.000000



It fails because total_cyles is smaller than the reported delta.
However [0] shows the timestamp recorded by the GPU itself, which is
reasonably close to the delta cycles (there are schedule gpu schedule
latencies that are accounted in one vs the other, so it's not exact).
So, it indeed executed for (at least) 20348520 cycles, but the reported
delta for total_cycles is 325261713058 - 325242482039 == 19231019

For me the reason for the failure is the first sample. It reads
cycles == 0, which means the ctx wasn't scheduled out yet to update the
counter. So, depending of how fast the CPU can read the first sample, it
may read a total_cycles that is actually more than the timestamp it
should have been: cycles may remain the same for some time while
total_cycles is still ticking.

I still think this patch here makes sense from a user perspective: we
should try to get the gpu timestamp and client timestamp close together
if user is expected to calculate a ratio between them.

From a testing perspective, maybe it'd better to check if the timestamp
reported by the GPU ([0]) closely matches the one reported via client
info ([1]) rather than calculating any percentage, which means leaving
the total_cycles out of the equation for igt.  Thoughts?

Lucas De Marchi


Cc: stable@xxxxxxxxxxxxxxx # v6.11+
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3512
Signed-off-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
---
drivers/gpu/drm/xe/xe_drm_client.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
index 298a587da7f17..e307b4d6bab5a 100644
--- a/drivers/gpu/drm/xe/xe_drm_client.c
+++ b/drivers/gpu/drm/xe/xe_drm_client.c
@@ -338,15 +338,12 @@ static void show_run_ticks(struct drm_printer *p, struct drm_file *file)

	/* Accumulate all the exec queues from this client */
	mutex_lock(&xef->exec_queue.lock);
-	xa_for_each(&xef->exec_queue.xa, i, q) {
-		xe_exec_queue_get(q);
-		mutex_unlock(&xef->exec_queue.lock);
+	preempt_disable();

+	xa_for_each(&xef->exec_queue.xa, i, q)
		xe_exec_queue_update_run_ticks(q);

-		mutex_lock(&xef->exec_queue.lock);
-		xe_exec_queue_put(q);
-	}
+	preempt_enable();
	mutex_unlock(&xef->exec_queue.lock);

	gpu_timestamp = xe_hw_engine_read_timestamp(hwe);
--
2.47.0





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux