On 02/24/23 16:14, Dietmar Eggemann wrote: > On 23/02/2023 16:38, Qais Yousef wrote: > > IMHO the patch title is misleading since what you want to avoid in > certain cases is that the RD DL accounting is updated. The code calls it rebuild_root_domain() .. > > > On 02/06/23 22:14, Qais Yousef wrote: > >> Commit f9a25f776d78 ("cpusets: Rebuild root domain deadline accounting information") .. and so is the original patch title. I think I have enough explanation in the commit message and renamed the function name to be more descriptive too. > >> enabled rebuilding root domain on cpuset and hotplug operations to > >> correct deadline accounting. > >> > >> Rebuilding root domain is a slow operation and we see 10+ of ms delays > >> on suspend-resume because of that (worst case captures 20ms which > >> happens often). > >> > >> Since nothing is expected to change on suspend-resume operation; skip > >> rebuilding the root domains to regain the some of the time lost. > >> > >> Achieve this by refactoring the code to pass whether dl accoutning needs > >> an update to rebuild_sched_domains(). And while at it, rename > >> rebuild_root_domains() to update_dl_rd_accounting() which I believe is > >> a more representative name since we are not really rebuilding the root > >> domains, but rather updating dl accounting at the root domain. > >> > >> Some users of rebuild_sched_domains() will skip dl accounting update > >> now: > >> > >> * Update sched domains when relaxing the domain level in cpuset > >> which only impacts searching level in load balance > > This one is cpuset related. (1) > > >> * update sched domains when cpufreq governor changes and we need > >> to create the perf domains > > This one is drivers/base/arch_topology.c [arm/arm64/...] related. (2) > > There are several levels of passing this `update_dl_accounting` > information through. I guess it looks like this: > > update_dl_accounting > > arm/arm64/riscv/parisc specific: > update_topology_flags_workfn() true > rebuild_sched_domains_energy() false (2) > > cpuset_hotplug_workfn() cpus_updated || > force_rebuild == CPUSET_FORCE_REBUILD_PRS_ERROR > > ->rebuild_sched_domains(update_dl_accounting) > > update_cpumasks_hier() true > update_relax_domain_level() false (1) > update_flag() true > update_prstate() true > > ->rebuild_sched_domains_locked(update_dl_accounting) > > ->partition_and_rebuild_sched_domains(..., update_dl_accounting) > > if (update_dl_accounting) > update_dl_rd_accounting() > > > There is already a somehow hidden interface for `sd/rd rebuild` > > int __weak arch_update_cpu_topology(void) > > which lets partition_sched_domains_locked() figure out whether sched > domains have to be rebuild.. > > But in your case it is more on the interface `cpuset/hotplug -> sd/rd > rebuild` and not only `arch -> `sd/rd rebuild``. > > IMHO, it would be still nice to have only one way to tell `sd/rd > rebuild` what to do and what not to do during sd/rd/(pd) rebuild. IIUC you're suggesting to introduce some new mechanism to detect if hotplug has lead to a cpu to disappear or not and use that instead? Are you saying I can use arch_update_cpu_topology() for that? Something like this? diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index e5ddc8e11e5d..60c3dcf06f0d 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1122,7 +1122,7 @@ partition_and_rebuild_sched_domains(int ndoms_new, cpumask_var_t doms_new[], { mutex_lock(&sched_domains_mutex); partition_sched_domains_locked(ndoms_new, doms_new, dattr_new); - if (update_dl_accounting) + if (arch_update_cpu_topology()) update_dl_rd_accounting(); mutex_unlock(&sched_domains_mutex); } I am not keen on this. arm64 seems to just read a value without a side effect. But x86 does reset this value so we can't read it twice in the same call tree and I'll have to extract it. The better solution that was discussed before is to not iterate through every task in the system and let cpuset track when dl tasks are added to it and do smarter iteration. ATM even if there are no dl tasks in the system we'll blindly go through every task in the hierarchy to update nothing. But I'll leave that to Juri to address if he wants. The original change has introduced a regression and people have noticed when phones cycle through suspend resume (screen unlock). Juri - could you please chip in on how you want to address this regression? In theory I should be just a reporter, but trying my best to help ;-) Cheers -- Qais Yousef