Kairui Song <ryncsn@xxxxxxxxx> 于2023年2月10日周五 00:08写道: > Johannes Weiner <hannes@xxxxxxxxxxx> 于2023年2月9日周四 03:20写道: > > On Wed, Feb 08, 2023 at 06:29:56PM +0100, Michal Koutný wrote: > > > On Thu, Feb 09, 2023 at 12:16:54AM +0800, Kairui Song <ryncsn@xxxxxxxxx> wrote: > > > > Signed-off-by: Kairui Song <kasong@xxxxxxxxxxx> > > > > Signed-off-by: Kairui Song <ryncsn@xxxxxxxxx> > > > > > > Typo? > > > > > > > -static inline struct psi_group *task_psi_group(struct task_struct *task) > > > > +static inline struct psi_group *psi_iter_first(struct task_struct *task, void **iter) > > > > { > > > > #ifdef CONFIG_CGROUPS > > > > - if (static_branch_likely(&psi_cgroups_enabled)) > > > > - return cgroup_psi(task_dfl_cgroup(task)); > > > > + if (static_branch_likely(&psi_cgroups_enabled)) { > > > > + struct cgroup *cgroup = task_dfl_cgroup(task); > > > > + > > > > + *iter = cgroup_parent(cgroup); > > > > > > This seems to skip a cgroup level -- maybe that's the observed > > > performance gain? > > > > Hm, I don't think it does. It sets up *iter to point to the parent for > > the _next() call, but it returns task_dfl_cgroup()->psi. The next call > > does the same: cgroup = *iter, *iter = parent, return cgroup->psi. > > > > It could be a bit more readable to have *iter always point to the > > current cgroup - but no strong preference either way from me: > > > > psi_groups_first(task, iter) > > { > > #ifdef CONFIG_CGROUPS > > if (static_branch_likely(&psi_cgroups_enabled)) { > > struct cgroup *cgroup = task_dfl_cgroup(task); > > > > *iter = cgroup; > > return cgroup_psi(cgroup); > > } > > #endif > > return &psi_system; > > } > > > > psi_groups_next(iter) > > { > > #ifdef CONFIG_CGROUPS > > if (static_branch_likely(&psi_cgroups_enabled)) { > > struct cgroup *cgroup = *iter; > > > > if (cgroup) { > > *iter = cgroup_parent(cgroup); > > return cgroup_psi(cgroup); > > } > > } > > return NULL; > > #endif > > } > > psi_groups_next(iter) > > { > > #ifdef CONFIG_CGROUPS > > if (static_branch_likely(&psi_cgroups_enabled)) { > > struct cgroup *cgroup = *iter; > > > > if (cgroup) { > > *iter = cgroup_parent(cgroup); > > return cgroup_psi(cgroup); > > } > > } > > return NULL; > > #endif > > } > > It should be like this, right? For psi_groups_next, retrieving cgroup > parent should be done before "if (cgroup)". > + psi_groups_next(iter) > + { > + #ifdef CONFIG_CGROUPS > + if (static_branch_likely(&psi_cgroups_enabled)) { > + struct cgroup *cgroup = *iter; > + > + cgroup = cgroup_parent(cgroup); > + if (cgroup) { > + *iter = cgroup; > + return cgroup_psi(cgroup); > + } > + } > + return NULL; > + #endif > + } > > Thanks for the suggestion! > > I think your style is better indeed. > > I tried to re-benchmark the code just in case, and found it seems my > previous benchmark result is not accurate enough now, some results > changed after I did a rebase to latest master, or maybe just 100 times > of perfpipe is not enough to get a stable result. > > With a few times of retest, the final conclusion of the commit message > is still valid, will post V2 later just after more test. Hi, I just ran perf bench test for a few days on a machine to get an stable average for latest rebased patch. And found it is indeed faster when the cgroup hierarchy is not too deep: With rootcg: 55718.9 op/s (unpatched) compared to 55862.2 (patched) With 5 levels: 49975.5 op/s (unpatched) compared to 50778.5 op/s (patched) Previous tests are a bit biased since I only run the test for 100 * 3 times, or maybe it is sensitive to some random kernel structure changes. But I ignored one important thing in my previous test, that the performance actually drops heavily with deeper levers of cgroups: With 8 levels: 48902.4 op/s (unpatched) compared to 47476.6 op/s (patched) With 50 levels of cgroup: 25605.7375 op/s (unpatched) compared to 20417.275 op/s (patched) That's a terrible regression :( , guess when there are too many cgroup structures, CPU cache can't hold them, and since the *parent shares same cacheline with rest of psi_group struct (like the bool enblaed, which are access every time), so unpatched will be faster. Sorry for missing such a critical point in v1, let's drop this series until there is a better way to optimize it.