On 23-May 16:18, Waiman Long wrote: > On 05/23/2018 01:34 PM, Patrick Bellasi wrote: > > Hi Waiman, > > > > On 17-May 16:55, Waiman Long wrote: > > > > [...] > > > >> @@ -672,13 +672,14 @@ static int generate_sched_domains(cpumask_var_t **domains, > >> int ndoms = 0; /* number of sched domains in result */ > >> int nslot; /* next empty doms[] struct cpumask slot */ > >> struct cgroup_subsys_state *pos_css; > >> + bool root_load_balance = is_sched_load_balance(&top_cpuset); > >> > >> doms = NULL; > >> dattr = NULL; > >> csa = NULL; > >> > >> /* Special case for the 99% of systems with one, full, sched domain */ > >> - if (is_sched_load_balance(&top_cpuset)) { > >> + if (root_load_balance && !top_cpuset.isolation_count) { > > Perhaps I'm missing something but, it seems to me that, when the two > > conditions above are true, then we are going to destroy and rebuild > > the exact same scheduling domains. > > > > IOW, on 99% of systems where: > > > > is_sched_load_balance(&top_cpuset) > > top_cpuset.isolation_count = 0 > > > > since boot time and forever, then every time we update a value for > > cpuset.cpus we keep rebuilding the same SDs. > > > > It's not strictly related to this patch, the same already happens in > > mainline based just on the first condition, but since you are extending > > that optimization, perhaps you can tell me where I'm possibly wrong or > > which cases I'm not considering. > > > > I'm interested mainly because on Android systems those conditions > > are always true and we see SDs rebuilds every time we write > > something in cpuset.cpus, which ultimately accounts for almost all the > > 6-7[ms] time required for the write to return, depending on the CPU > > frequency. > > > > Cheers Patrick > > > Yes, that is true. I will look into how to further optimize this. Thanks > for the suggestion. FWIW, following is my take on top of your series. With the following patch applied I see a reduction of the average execution time for a rebuild_sched_domains_locked() from 1.4[ms] to 40[us] while running 60 /tg1/cpuset.cpus switches in a loop on an JunoR2 Arm board using the performance cpufreq governor. ---8<--- >From 84bb8137ce79f74849d97e30871cf67d06d8d682 Mon Sep 17 00:00:00 2001 From: Patrick Bellasi <patrick.bellasi@xxxxxxx> Date: Wed, 23 May 2018 16:33:06 +0100 Subject: [PATCH 1/1] cgroup/cpuset: disable sched domain rebuild when not required The generate_sched_domains() already addresses the "special case for 99% of systems" which require a single full sched domain at the root, spanning all the CPUs. However, the current support is based on an expensive sequence of operations which destroy and recreate the exact same scheduling domain configuration. If we notice that: 1) CPUs in "cpuset.isolcpus" are excluded from load balancing by the isolcpus= kernel boot option, and will never be load balanced regardless of the value of "cpuset.sched_load_balance" in any cpuset. 2) the root cpuset has load_balance enabled by default at boot and it's the only parameter which userspace can change at run-time. we know that, by default, every system comes up with a complete and properly configured set of scheduling domains covering all the CPUs. Thus, on every system, unless the user explicitly disables load balance for the top_cpuset, the scheduling domains already configured at boot time by the scheduler/topology code and updated in consequence of hotplug events, are already properly configured for cpuset too. This configuration is the default one for 99% of the systems, and it's also the one used by most of the Android devices which never disable load balance from the top_cpuset. Thus, while load balance is enabled for the top_cpuset, destroying/rebuilding the scheduling domains at every cpuset.cpus reconfiguration is a useless operation which will always produce the same result. Let's anticipate the "special" optimization within: rebuild_sched_domains_locked() thus completely skipping the expensive: generate_sched_domains() partition_sched_domains() for all the cases we know that the scheduling domains already defined will not be affected by whatsoever value of cpuset.cpus. The proposed solution is the minimal variation to optimize the case for systems with load balance enabled at the root level and without isolated CPUs. As soon as one of these conditions is not more valid, we fall back to the original behavior. Signed-off-by: Patrick Bellasi <patrick.bellasi@xxxxxxx> Cc: Li Zefan <lizefan@xxxxxxxxxx> Cc: Tejun Heo <tj@xxxxxxxxxx>, Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: Frederic Weisbecker <frederic@xxxxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Mike Galbraith <efault@xxxxxx> Cc: Paul Turner <pjt@xxxxxxxxxx> Cc: Waiman Long <longman@xxxxxxxxxx> Cc: Juri Lelli <juri.lelli@xxxxxxxxxx> Cc: kernel-team@xxxxxx Cc: cgroups@xxxxxxxxxxxxxxx Cc: linux-kernel@xxxxxxxxxxxxxxx --- kernel/cgroup/cpuset.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 8f586e8bdc98..cff14be94678 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -874,6 +874,11 @@ static void rebuild_sched_domains_locked(void) !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask)) goto out; + /* Special case for the 99% of systems with one, full, sched domain */ + if (!top_cpuset.isolation_count && + is_sched_load_balance(&top_cpuset)) + goto out; + /* Generate domain masks and attrs */ ndoms = generate_sched_domains(&doms, &attr); -- 2.15.1 ---8<--- -- #include <best/regards.h> Patrick Bellasi -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html