On Thu, Oct 07, 2021 at 09:17:34AM +0100, Tvrtko Ursulin wrote:
On 06/10/2021 21:45, Umesh Nerlige Ramappa wrote:
On Wed, Oct 06, 2021 at 10:11:58AM +0100, Tvrtko Ursulin wrote:
[snip]
@@ -762,12 +764,25 @@ submission_disabled(struct intel_guc *guc)
static void disable_submission(struct intel_guc *guc)
{
struct i915_sched_engine * const sched_engine = guc->sched_engine;
+ struct intel_gt *gt = guc_to_gt(guc);
+ struct intel_engine_cs *engine;
+ enum intel_engine_id id;
+ unsigned long flags;
if (__tasklet_is_enabled(&sched_engine->tasklet)) {
GEM_BUG_ON(!guc->ct.enabled);
__tasklet_disable_sync_once(&sched_engine->tasklet);
sched_engine->tasklet.callback = NULL;
}
+
+ cancel_delayed_work(&guc->timestamp.work);
I am not sure when disable_submission gets called so a question -
could it be important to call cancel_delayed_work_sync here to
ensure if the worker was running it had exited before proceeding?
disable_submission is called in the reset_prepare path for uc
resets. I see this happening only with busy-hang test which does a
global gt reset. The counterpart for this is the
guc_init_engine_stats which is called post reset in the path to
initialize GuC.
I tried cancel_delayed_work_sync both here and in park. Seems to
work fine, so will change the calls to _sync versions.
From park is not allowed to sleep so can't do sync from there. It
might have been my question which put you on a wrong path, sorry. Now
I think question remains what happens if the ping worker happens to be
sampling GuC state as GuC is being reset? Do you need some sort of a
lock to protect that, or make sure worker skips if reset in progress?
If ping ran after the actual gt reset, we should be okay. If it ran
after we reset prev_total and before gt reset, then we have bad
busyness. At the same time, skipping ping risks timestamp overflow. I am
thinking skip ping, but update all stats in the reset_prepare path.
reset_prepare is running with pm runtime.
On a different note, during reset, we need to store now-start into the
total_gt_clks also because we may lose that information in the next pmu
query or ping (post reset). Maybe I will store active_clks instead of
running in the stats to do that.
Thanks,
Umesh