On 06/10/2021 21:45, Umesh Nerlige Ramappa wrote:
On Wed, Oct 06, 2021 at 10:11:58AM +0100, Tvrtko Ursulin wrote:
[snip]
@@ -762,12 +764,25 @@ submission_disabled(struct intel_guc *guc)
static void disable_submission(struct intel_guc *guc)
{
struct i915_sched_engine * const sched_engine = guc->sched_engine;
+ struct intel_gt *gt = guc_to_gt(guc);
+ struct intel_engine_cs *engine;
+ enum intel_engine_id id;
+ unsigned long flags;
if (__tasklet_is_enabled(&sched_engine->tasklet)) {
GEM_BUG_ON(!guc->ct.enabled);
__tasklet_disable_sync_once(&sched_engine->tasklet);
sched_engine->tasklet.callback = NULL;
}
+
+ cancel_delayed_work(&guc->timestamp.work);
I am not sure when disable_submission gets called so a question -
could it be important to call cancel_delayed_work_sync here to ensure
if the worker was running it had exited before proceeding?
disable_submission is called in the reset_prepare path for uc resets. I
see this happening only with busy-hang test which does a global gt
reset. The counterpart for this is the guc_init_engine_stats which is
called post reset in the path to initialize GuC.
I tried cancel_delayed_work_sync both here and in park. Seems to work
fine, so will change the calls to _sync versions.
From park is not allowed to sleep so can't do sync from there. It might
have been my question which put you on a wrong path, sorry. Now I think
question remains what happens if the ping worker happens to be sampling
GuC state as GuC is being reset? Do you need some sort of a lock to
protect that, or make sure worker skips if reset in progress?
Also, does this interact with the open about resets? Should/could
parking helper be called from here?
It is related to reset. Below, I am only updating the engine prev_total
to 0 since it gets reset on gt reset. I thought that's all we need to
keep the busyness increasing monotonically. By calling parking helper,
are you suggesting we should update the other stats too (total, start,
gt_stamp etc.)?
Don't know, was just asking.
Looking at it now again, resetting prev_total looks correct to me if it
tracks rec->total_runtime which is also reset by GuC.
Engine->stats.total_gt_clks is then purely software managed state which
you only keep adding to. Yes looks fine to me.
+ mod_delayed_work(system_highpri_wq, &guc->timestamp.work,
+ guc->timestamp.ping_delay);
+}
+
+static int guc_action_enable_usage_stats(struct intel_guc *guc)
+{
+ u32 offset = intel_guc_engine_usage_offset(guc);
+ u32 action[] = {
+ INTEL_GUC_ACTION_SET_ENG_UTIL_BUFF,
+ offset,
+ 0,
+ };
+
+ return intel_guc_send(guc, action, ARRAY_SIZE(action));
+}
+
+static void guc_init_engine_stats(struct intel_guc *guc)
+{
+ struct intel_gt *gt = guc_to_gt(guc);
+ intel_wakeref_t wakeref;
+
+ mod_delayed_work(system_highpri_wq, &guc->timestamp.work,
+ guc->timestamp.ping_delay);
Not sure how this slots in with unpark. It will probably be called two
times but it also probably does not matter? If you can figure it out
perhaps you can remove this call from here. Or maybe there is a
separate path where disable-enable can be called without the
park-unpark transition. In which case you could call the unpark helper
here. Not sure really.
- disable_submission pairs with guc_init_engine_stats for the gt reset
path.
- park/unpark just follow the gt_park/gt_unpark paths.
I haven't checked if reset eventually results in park/unpark or if they
are separate paths though. In the reset path, there are a bunch of
i915_requests going on, so difficult to say if reset caused the
gt_park/gt_unpark or was it the requests.
The cases where mod_delayed_work is called twice are:
1) module load
2) i915_gem_resume (based on rc6-suspend test)
In both cases, unpark is followed by guc_init_engine_stats. Looking a
bit at what is returned from the mod_delayed_work, I see that it just
modifies the time if the work is already queued/pending, so I am
thinking we should be okay.
I don't see cancel getting called twice without a mod_delayed_work in
between.
Sounds good.
Regards,
Tvrtko