On Tue, 01 Nov 2022, John.C.Harrison@xxxxxxxxx wrote: > From: John Harrison <John.C.Harrison@xxxxxxxxx> > > At the end of each test, IGT does a drop caches call via sysfs with sysfs? > special flags set. One of the possible paths waits for idle with an > infinite timeout. That causes problems for debugging issues when CI > catches a "can't go idle" test failure. Best case, the CI system times > out (after 90s), attempts a bunch of state dump actions and then > reboots the system to recover it. Worst case, the CI system can't do > anything at all and then times out (after 1000s) and simply reboots. > Sometimes a serial port log of dmesg might be available, sometimes not. > > So rather than making life hard for ourselves, change the timeout to > be 10s rather than infinite. Also, trigger the standard > wedge/reset/recover sequence so that testing can continue with a > working system (if possible). > > Signed-off-by: John Harrison <John.C.Harrison@xxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_debugfs.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c > index ae987e92251dd..9d916fbbfc27c 100644 > --- a/drivers/gpu/drm/i915/i915_debugfs.c > +++ b/drivers/gpu/drm/i915/i915_debugfs.c > @@ -641,6 +641,9 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_perf_noa_delay_fops, > DROP_RESET_ACTIVE | \ > DROP_RESET_SEQNO | \ > DROP_RCU) > + > +#define DROP_IDLE_TIMEOUT (HZ * 10) I915_IDLE_ENGINES_TIMEOUT is defined in i915_drv.h. It's also only used here. I915_GEM_IDLE_TIMEOUT is defined in i915_gem.h. It's only used in gt/intel_gt.c. I915_GT_SUSPEND_IDLE_TIMEOUT is defined and used only in intel_gt_pm.c. I915_IDLE_ENGINES_TIMEOUT is in ms, the rest are in jiffies. My head spins. BR, Jani. > + > static int > i915_drop_caches_get(void *data, u64 *val) > { > @@ -661,7 +664,9 @@ gt_drop_caches(struct intel_gt *gt, u64 val) > intel_gt_retire_requests(gt); > > if (val & (DROP_IDLE | DROP_ACTIVE)) { > - ret = intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT); > + ret = intel_gt_wait_for_idle(gt, DROP_IDLE_TIMEOUT); > + if (ret == -ETIME) > + intel_gt_set_wedged(gt); > if (ret) > return ret; > } -- Jani Nikula, Intel Open Source Graphics Center