Hello, On Thu, Jun 01, 2023 at 07:33:53PM +0200, Michal Koutný wrote: > > Inside css_task_iter_start/next/end, css_set_lock is hold and then > > released, so when iterating task(left side), the css_set may be moved to > > another list(right side), then it->cset_head points to the old list head > > and it->cset_pos->next points to the head node of new list, which can't > > be used as struct css_set. > > I find your analysis sane -- the stale it->cset_head is problematic. > > > To fix this issue, introduce CSS_TASK_ITER_STOPPED flag for css_task_iter. > > when moving css_set to dcgrp->e_csets[ss->id] in rebind_subsystems(), stop > > the task iteration. > > Does it mean that iteration would not yield all tasks that are > associated with give cs->css? That sounds like broken correctness of the > iterator. > > I may suggest a slightly different approach that should not affect > running iterators. > - I had to switch from all css_sets to only scgrp's css_sets since > css_set_table order of css_sets may be different than scgrp->e_csets > - Not sure how portable is using array element as a `member` argument of > offsetof (in expansion of list_for_each_entry_safe). Both look fine to me. > This is only to illustrate the idea, i.e. merely compile tested. > > WDYT? > > Regards, > Michal > > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c > index 625d7483951c..e67d2a0776c1 100644 > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -1798,7 +1798,7 @@ int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) > { > struct cgroup *dcgrp = &dst_root->cgrp; > struct cgroup_subsys *ss; > - int ssid, i, ret; > + int ssid, ret; > u16 dfl_disable_ss_mask = 0; > > lockdep_assert_held(&cgroup_mutex); > @@ -1842,7 +1842,8 @@ int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) > struct cgroup_root *src_root = ss->root; > struct cgroup *scgrp = &src_root->cgrp; > struct cgroup_subsys_state *css = cgroup_css(scgrp, ss); > - struct css_set *cset; > + struct css_set *cset, *cset_pos; > + struct css_task_iter *it; > > WARN_ON(!css || cgroup_css(dcgrp, ss)); > > @@ -1860,9 +1861,18 @@ int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) > css->cgroup = dcgrp; > > spin_lock_irq(&css_set_lock); > - hash_for_each(css_set_table, i, cset, hlist) > + WARN_ON(!list_empty(&dcgrp->e_csets[ss->id])); > + list_for_each_entry_safe(cset, cset_pos, &scgrp->e_csets[ss->id], e_cset_node[ss->id]) { > list_move_tail(&cset->e_cset_node[ss->id], > &dcgrp->e_csets[ss->id]); > + /* all css_sets of scgrp together in same order to dcgrp, > + * patch in-flight iterators to preserve correct iteration, > + * cset_head is under css_set_lock */ > + list_for_each_entry(it, &cset->task_iters, iters_node) { > + if (it->cset_head == &scgrp->e_csets[ss->id]) > + it->cset_head = &dcgrp->e_csets[ss->id]; This looks fine to me too but this is rather subtle and it'd worth explaining further. e.g. on the first glance, it may also seem like we'd also need to update it->cset_pos and friends because they get initialized to e_csets[]. This isn't the case because the iterator is always advanced right away and doesn't remain pointing to the head, but it is kinda tricky. Thanks. -- tejun