On 2023/7/10 23:40, Waiman Long wrote: > On 7/10/23 11:11, Michal Koutný wrote: >> Hello. >> >> On Sat, Jul 01, 2023 at 02:50:49PM +0800, Miaohe Lin <linmiaohe@xxxxxxxxxx> wrote: >>> --- a/kernel/cgroup/cpuset.c >>> +++ b/kernel/cgroup/cpuset.c >>> @@ -1806,9 +1806,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, >>> cpuset_for_each_child(cp, css, parent) >>> if (is_partition_valid(cp) && >>> cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) { >>> + if (!css_tryget_online(&cp->css)) >>> + continue; >>> rcu_read_unlock(); >>> update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp); >>> rcu_read_lock(); >>> + css_put(&cp->css); >> Apologies for a possibly noob question -- why is RCU read lock >> temporarily dropped within the loop? >> (Is it only because of callback_lock or cgroup_file_kn_lock (via >> notify_partition_change()) on PREEMPT_RT?) >> >> >> >> [ >> OT question: >> cpuset_for_each_child(cp, css, parent) (1) >> if (is_partition_valid(cp) && >> cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) { >> if (!css_tryget_online(&cp->css)) >> continue; >> rcu_read_unlock(); >> update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp); >> ... >> update_tasks_cpumask(cp->parent) >> ... >> css_task_iter_start(&cp->parent->css, 0, &it); (2) >> ... >> rcu_read_lock(); >> css_put(&cp->css); >> } >> >> May this touch each task same number of times as its depth within >> herarchy? > > I believe the primary reason is because update_parent_subparts_cpumask() can potential run for quite a while. So we don't want to hold the rcu_read_lock for too long. There may also be a potential that schedule() may be called. IMHO, the reason should be as same as the below commit: commit 2bdfd2825c9662463371e6691b1a794e97fa36b4 Author: Waiman Long <longman@xxxxxxxxxx> Date: Wed Feb 2 22:31:03 2022 -0500 cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning It was found that a "suspicious RCU usage" lockdep warning was issued with the rcu_read_lock() call in update_sibling_cpumasks(). It is because the update_cpumasks_hier() function may sleep. So we have to release the RCU lock, call update_cpumasks_hier() and reacquire it afterward. Also add a percpu_rwsem_assert_held() in update_sibling_cpumasks() instead of stating that in the comment. Thanks both.