On Tue 07-12-21 20:08:43, Michal Koutný wrote: > On Wed, Dec 01, 2021 at 02:34:39PM +0100, Jan Kara <jack@xxxxxxx> wrote: > > After some analysis we've found out that the culprit of the problem is > > that some task is reparented from cgroup G to the root cgroup and G is > > offlined. > > Just sharing my interpretation for context -- (I saw this was a system > using the unified cgroup hierarchy, io_cgrp_subsys_on_dfl_key was > enabled) and what was observed could also have been disabling the io > controller on given level -- that would also manifest similarly -- the > task is migrated to parent and the former blkcg is offlined. Yes, that's another possibility. > > +static void bfq_reparent_children(struct bfq_data *bfqd, struct bfq_group *bfqg) > > [...] > > - bfq_bfqq_move(bfqd, bfqq, bfqd->root_group); > > [...] > > + hlist_for_each_entry_safe(bfqq, next, &bfqg->children, children_node) > > + bfq_bfqq_move(bfqd, bfqq, bfqd->root_group); > > Here I assume root_group is (representing) the global blkcg root and > this reparenting thus skips all ancestors between the removed leaf and > the root. IIUC the associated io_context would then be treated as if it > was running in the root blkcg. > (Admittedly, this isn't a change from this patch but it may cause some > surprises if the given process runs after the operation.) Yes, this is what happens in bfq_reparent_children() and basically preserves what BFQ was already doing for a subset of bfq queues. > Reparenting to the immediate ancestors should be safe as cgroup core > should ensure children are offlined before parents. Would it make sense > to you? I suppose yes, it makes more sense to reparent just to immediate parents instead of the root of the blkcg hierarchy. Initially when developing the patch I was not sure whether parent has to be still alive but as you write it should be safe. I'll modify the patch to: static void bfq_reparent_children(struct bfq_data *bfqd, struct bfq_group *bfqg) { struct bfq_queue *bfqq; struct hlist_node *next; struct bfq_group *parent; parent = bfqg_parent(bfqg); if (!parent) parent = bfqd->root_group; hlist_for_each_entry_safe(bfqq, next, &bfqg->children, children_node) bfq_bfqq_move(bfqd, bfqq, parent); } > > @@ -897,38 +844,17 @@ static void bfq_pd_offline(struct blkg_policy_data *pd) > > [...] > > - * It may happen that some queues are still active > > - * (busy) upon group destruction (if the corresponding > > - * processes have been forced to terminate). We move > > - * all the leaf entities corresponding to these queues > > - * to the root_group. > > This comment is removed but it seems to me it assumed that the > reparented entities are only some transitional remainings of terminated > tasks but they may be the processes migrated upwards with a long (IO > active) life ahead. Yes, this seemed to be a misconception of the old code... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR