On Thu, Jan 12, 2023 at 04:56:07PM +0000, Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote: > +static int drmcs_can_attach(struct cgroup_taskset *tset) > +{ > + int ret; > + > + /* > + * As processes are getting moved between groups we need to ensure > + * both that the old group does not see a sudden downward jump in the > + * GPU utilisation, and that the new group does not see a sudden jump > + * up with all the GPU time clients belonging to the migrated process > + * have accumulated. > + * > + * To achieve that we suspend the scanner until the migration is > + * completed where the resume at the end ensures both groups start > + * observing GPU utilisation from a reset state. > + */ > + > + ret = mutex_lock_interruptible(&drmcg_mutex); > + if (ret) > + return ret; > + start_suspend_scanning(); > + mutex_unlock(&drmcg_mutex); > + > + finish_suspend_scanning(); Here's scanning suspension, communicated via root_drmcs.scanning_suspended = true; root_drmcs.suspended_period_us = root_drmcs.period_us; root_drmcs.period_us = 0; but I don't see those used in scan_worker() and the scanning traversal can apparently run concurrently with a task migration. > [...] > +static bool > +__start_scanning(struct drm_cgroup_state *root, unsigned int period_us) > [...] > + css_for_each_descendant_post(node, &root->css) { > [...] > + active = drmcs_get_active_time_us(drmcs); > + if (period_us && active > drmcs->prev_active_us) > + drmcs->active_us += active - drmcs->prev_active_us; > + drmcs->prev_active_us = active; drmcs_get_active_time_us() could count a task's contribution here, the task would migrate to a different drmcs, and it'd be counted 2nd time.
Attachment:
signature.asc
Description: Digital signature