On 09/07/2017 11:21 PM, Peter Zijlstra wrote: > On Thu, Sep 07, 2017 at 07:26:23PM +0530, Prateek Sood wrote: >> Remove circular dependency deadlock in a scenario where hotplug of CPU is >> being done while there is updation in cgroup and cpuset triggered from >> userspace. > > You've forgotten to mention your solution to the deadlock, namely > inverting cpuset_mutex and cpu_hotplug_lock. > >> Signed-off-by: Prateek Sood <prsood@xxxxxxxxxxxxxx> >> --- >> kernel/cgroup/cpuset.c | 32 +++++++++++++++++++------------- >> 1 file changed, 19 insertions(+), 13 deletions(-) >> >> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c >> index 2f4039b..60dc0ac 100644 >> --- a/kernel/cgroup/cpuset.c >> +++ b/kernel/cgroup/cpuset.c >> @@ -816,16 +816,15 @@ static int generate_sched_domains(cpumask_var_t **domains, >> * 'cpus' is removed, then call this routine to rebuild the >> * scheduler's dynamic sched domains. >> * >> - * Call with cpuset_mutex held. Takes get_online_cpus(). >> */ >> -static void rebuild_sched_domains_locked(void) >> +static void rebuild_sched_domains_cpuslocked(void) >> { >> struct sched_domain_attr *attr; >> cpumask_var_t *doms; >> int ndoms; >> >> + lockdep_assert_cpus_held(); >> lockdep_assert_held(&cpuset_mutex); >> - get_online_cpus(); >> >> /* >> * We have raced with CPU hotplug. Don't do anything to avoid >> @@ -833,27 +832,27 @@ static void rebuild_sched_domains_locked(void) >> * Anyways, hotplug work item will rebuild sched domains. >> */ >> if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask)) >> - goto out; >> + return; >> >> /* Generate domain masks and attrs */ >> ndoms = generate_sched_domains(&doms, &attr); >> >> /* Have scheduler rebuild the domains */ >> partition_sched_domains(ndoms, doms, attr); >> -out: >> - put_online_cpus(); >> } >> #else /* !CONFIG_SMP */ >> -static void rebuild_sched_domains_locked(void) >> +static void rebuild_sched_domains_cpuslocked(void) >> { >> } >> #endif /* CONFIG_SMP */ >> >> void rebuild_sched_domains(void) >> { >> + get_online_cpus(); >> mutex_lock(&cpuset_mutex); >> - rebuild_sched_domains_locked(); >> + rebuild_sched_domains_cpuslocked(); >> mutex_unlock(&cpuset_mutex); >> + put_online_cpus(); >> } > > But if you invert these locks, the need for cpuset_hotplug_workfn() goes > away, at least for the CPU part, and we can make in synchronous again. > Yay!! > > Also, I think new code should use cpus_read_lock() instead of > get_online_cpus(). > Thanks for the review comments Peter. For patch related to circular deadlock, I will send an updated version. The callback making a call to cpuset_hotplug_workfn()in hotplug path are [CPUHP_AP_ACTIVE] = { .name = "sched:active", .startup.single = sched_cpu_activate, .teardown.single = sched_cpu_deactivate, }, if we make cpuset_hotplug_workfn() synchronous, deadlock might happen: _cpu_down() cpus_write_lock() //held cpuhp_kick_ap_work() cpuhp_kick_ap() __cpuhp_kick_ap() wake_up_process() //cpuhp_thread_fun wait_for_ap_thread() //wait for complete from cpuhp_thread_fun() cpuhp_thread_fun() cpuhp_invoke_callback() sched_cpu_deactivate() cpuset_cpu_inactive() cpuset_update_active_cpus() cpuset_hotplug_work() rebuild_sched_domains() cpus_read_lock() //waiting as acquired in _cpu_down() -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html