Re: [PATCH] drm/i915/pmu: Connect engine busyness stats from GuC to pmu

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 06/10/2021 21:45, Umesh Nerlige Ramappa wrote:
On Wed, Oct 06, 2021 at 10:11:58AM +0100, Tvrtko Ursulin wrote:

[snip]

@@ -762,12 +764,25 @@ submission_disabled(struct intel_guc *guc)
 static void disable_submission(struct intel_guc *guc)
 {
     struct i915_sched_engine * const sched_engine = guc->sched_engine;
+    struct intel_gt *gt = guc_to_gt(guc);
+    struct intel_engine_cs *engine;
+    enum intel_engine_id id;
+    unsigned long flags;
     if (__tasklet_is_enabled(&sched_engine->tasklet)) {
         GEM_BUG_ON(!guc->ct.enabled);
         __tasklet_disable_sync_once(&sched_engine->tasklet);
         sched_engine->tasklet.callback = NULL;
     }
+
+    cancel_delayed_work(&guc->timestamp.work);

I am not sure when disable_submission gets called so a question - could it be important to call cancel_delayed_work_sync here to ensure if the worker was running it had exited before proceeding?

disable_submission is called in the reset_prepare path for uc resets. I see this happening only with busy-hang test which does a global gt reset. The counterpart for this is the guc_init_engine_stats which is called post reset in the path to initialize GuC.

I tried cancel_delayed_work_sync both here and in park. Seems to work fine, so will change the calls to _sync versions.

From park is not allowed to sleep so can't do sync from there. It might have been my question which put you on a wrong path, sorry. Now I think question remains what happens if the ping worker happens to be sampling GuC state as GuC is being reset? Do you need some sort of a lock to protect that, or make sure worker skips if reset in progress?


Also, does this interact with the open about resets? Should/could parking helper be called from here?

It is related to reset. Below, I am only updating the engine prev_total to 0 since it gets reset on gt reset. I thought that's all we need to keep the busyness increasing monotonically. By calling parking helper, are you suggesting we should update the other stats too (total, start, gt_stamp etc.)?

Don't know, was just asking.

Looking at it now again, resetting prev_total looks correct to me if it tracks rec->total_runtime which is also reset by GuC. Engine->stats.total_gt_clks is then purely software managed state which you only keep adding to. Yes looks fine to me.


+    mod_delayed_work(system_highpri_wq, &guc->timestamp.work,
+             guc->timestamp.ping_delay);
+}
+
+static int guc_action_enable_usage_stats(struct intel_guc *guc)
+{
+    u32 offset = intel_guc_engine_usage_offset(guc);
+    u32 action[] = {
+        INTEL_GUC_ACTION_SET_ENG_UTIL_BUFF,
+        offset,
+        0,
+    };
+
+    return intel_guc_send(guc, action, ARRAY_SIZE(action));
+}
+
+static void guc_init_engine_stats(struct intel_guc *guc)
+{
+    struct intel_gt *gt = guc_to_gt(guc);
+    intel_wakeref_t wakeref;
+
+    mod_delayed_work(system_highpri_wq, &guc->timestamp.work,
+             guc->timestamp.ping_delay);

Not sure how this slots in with unpark. It will probably be called two times but it also probably does not matter? If you can figure it out perhaps you can remove this call from here. Or maybe there is a separate path where disable-enable can be called without the park-unpark transition. In which case you could call the unpark helper here. Not sure really.

- disable_submission pairs with guc_init_engine_stats for the gt reset  path.
- park/unpark just follow the gt_park/gt_unpark paths.

I haven't checked if reset eventually results in park/unpark or if they are separate paths though. In the reset path, there are a bunch of i915_requests going on, so difficult to say if reset caused the gt_park/gt_unpark or was it the requests.

The cases where mod_delayed_work is called twice are:

1) module load
2) i915_gem_resume (based on rc6-suspend test)

In both cases, unpark is followed by guc_init_engine_stats. Looking a bit at what is returned from the mod_delayed_work, I see that it just modifies the time if the work is already queued/pending, so I am thinking we should be okay.

I don't see cancel getting called twice without a mod_delayed_work in between.

Sounds good.

Regards,

Tvrtko



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux