[PATCH] drm/i915/selftests: Use preemption timeout on cleanup

Janusz Krzysztofik <janusz.krzysztofik@xxxxxxxxxxxxxxx> · Fri, 13 Dec 2024 19:59:48 +0100

Many selftests call igt_flush_test() on cleanup.  With default preemption
timeout of compute engines raised to 7.5 seconds, hardcoded flush timeout
of 3 seconds is too short.  That results in GPU forcibly wedged and kernel
taineted, then IGT abort triggered.  CI BAT runs loose a part of their
expected coverage.

Calculate the flush timeout based on the longest preemption timeout
currently configured for any engine.  That way, selftest can still report
detected issues as non-critical, and the GPU gets a chance to recover from
preemptible hangs and prepare for fluent execution of next test cases.

Link: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12061
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@xxxxxxxxxxxxxxx>
---
 drivers/gpu/drm/i915/selftests/igt_flush_test.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
index 29110abb4fe05..d4b216065f2eb 100644
--- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
@@ -19,12 +19,21 @@ int igt_flush_test(struct drm_i915_private *i915)
 	int ret = 0;
 
 	for_each_gt(gt, i915, i) {
+		struct intel_engine_cs *engine;
+		unsigned long timeout_ms = 0;
+		unsigned int id;
+
 		if (intel_gt_is_wedged(gt))
 			ret = -EIO;
 
+		for_each_engine(engine, gt, id) {
+			if (engine->props.preempt_timeout_ms > timeout_ms)
+				timeout_ms = engine->props.preempt_timeout_ms;
+		}
+
 		cond_resched();
 
-		if (intel_gt_wait_for_idle(gt, HZ * 3) == -ETIME) {
+		if (intel_gt_wait_for_idle(gt, HZ * timeout_ms / 500) == -ETIME) {
 			pr_err("%pS timed out, cancelling all further testing.\n",
 			       __builtin_return_address(0));
 
-- 
2.47.1