Quoting Bloomfield, Jon (2019-08-26 15:08:02) > > -----Original Message----- > > From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > + * While the engine is active, we send a periodic pulse along the entire > > along the entire what? s/entire/engine/ > > + * to check on its health and to flush any idle-barriers. If that request > > + * is stuck, and we fail to preempt it, we declare the engine hung and > > + * issue a reset -- in the hope that restores progress. > > + */ > > + > > +static long delay(void) > > Probably NOT the best function name in the world, ever. 'delay' could equally be a verb, so it's not self-descriptive. Is it a noun or verb in this case? Or just shorthand. > > +{ > > + const long t = > > msecs_to_jiffies(CONFIG_DRM_I915_HEARTBEAT_INTERVAL); > > + > > + return round_jiffies_up_relative(t); > > +} > > +static void heartbeat(struct work_struct *wrk) > > +{ > > + struct i915_sched_attr attr = { > > + .priority = I915_USER_PRIORITY(I915_PRIORITY_MIN), > > + }; > > + struct intel_engine_cs *engine = > > + container_of(wrk, typeof(*engine), heartbeat.work); > > + struct intel_context *ce = engine->kernel_context; > > + struct i915_request *rq; > > + > > + if (!intel_engine_pm_get_if_awake(engine)) > > + return; > > + > > + rq = engine->last_heartbeat; > > + if (rq && i915_request_completed(rq)) { > > + i915_request_put(rq); > > + engine->last_heartbeat = NULL; > > + } > > + > > + if (intel_gt_is_wedged(engine->gt)) > > + goto out; > > + > > + if (engine->last_heartbeat) { > > + if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) { > > + struct drm_printer p = drm_debug_printer(__func__); > > + > > + intel_engine_dump(engine, &p, > > + "%s heartbeat not ticking\n", > > + engine->name); > > + } > > + > > + if (engine->schedule && > > + rq->sched.attr.priority < I915_PRIORITY_BARRIER) { > > + attr.priority = > > + > > I915_USER_PRIORITY(I915_PRIORITY_HEARTBEAT); > > + if (rq->sched.attr.priority >= attr.priority) > > + attr.priority = I915_PRIORITY_BARRIER; > > + > > + local_bh_disable(); > > + engine->schedule(rq, &attr); > > + local_bh_enable(); > > + } else { > > + intel_gt_handle_error(engine->gt, engine->mask, > > + I915_ERROR_CAPTURE, > > + "stopped heartbeat on %s", > > + engine->name); > > + } > > + goto out; > > + } > > + > > + if (engine->wakeref_serial == engine->serial) > > + goto out; > > + > > + mutex_lock(&ce->timeline->mutex); > > + > > + intel_context_enter(ce); > > + rq = __i915_request_create(ce, GFP_NOWAIT | __GFP_NOWARN); > > + intel_context_exit(ce); > > + if (IS_ERR(rq)) > > + goto unlock; > > + > > + idle_pulse(engine, rq); > > + if (i915_modparams.enable_hangcheck) > > + engine->last_heartbeat = i915_request_get(rq); > > + > > + __i915_request_commit(rq); > > + __i915_request_queue(rq, &attr); > > + > > +unlock: > > + mutex_unlock(&ce->timeline->mutex); > > +out: > > + schedule_delayed_work(&engine->heartbeat, delay()); > > Isn't engine->heartbeat now NULL in some cases? engine->heartbeat, the worker vs engine->last_heartbeat Maybe, struct intel_engine_heartbeat { work; systole; }; > > diff --git a/drivers/gpu/drm/i915/i915_getparam.c > > b/drivers/gpu/drm/i915/i915_getparam.c > > index 5d9101376a3d..e6c351080593 100644 > > --- a/drivers/gpu/drm/i915/i915_getparam.c > > +++ b/drivers/gpu/drm/i915/i915_getparam.c > > @@ -78,8 +78,7 @@ int i915_getparam_ioctl(struct drm_device *dev, void > > *data, > > return -ENODEV; > > break; > > case I915_PARAM_HAS_GPU_RESET: > > - value = i915_modparams.enable_hangcheck && > > - intel_has_gpu_reset(i915); > > + value = intel_has_gpu_reset(i915); > > Don't understand this tweak. We haven't really changed the essence of hangcheck, just improved it, so why do we change this get_param? I deleted the modparam in earlier patches. But anticipated you would object... -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx