The reported problem here occurs when cgroup hierarchy is unmounted quickly after last cgroup removal. The last cgroup prevents the root cgroup css->refcnt from being killed. The respective cgroup root thus remains permanently in existence. This is actually intended behavior for memory controller whose state is long-lived and there is no better option to attach it later (see also commit 3c606d35fe97 ("cgroup: prevent mount hang due to memory controller lifetime")). We can make the situation better by checking children list only after any cgroups in the middle of removal are gone, detected via cgroup_destroy_wq. Reported-by: Bui Quang Minh <minhquangbui99@xxxxxxxxx> Link: https://lore.kernel.org/r/20220404142535.145975-1-minhquangbui99@xxxxxxxxx Signed-off-by: Michal Koutný <mkoutny@xxxxxxxx> --- kernel/cgroup/cgroup.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index adb820e98f24..a5b0d5d54fbc 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -2205,11 +2205,14 @@ static void cgroup_kill_sb(struct super_block *sb) struct cgroup_root *root = cgroup_root_from_kf(kf_root); /* - * If @root doesn't have any children, start killing it. + * If @root doesn't have any children held by residual state (e.g. + * memory controller), start killing it, flush workqueue to filter out + * transiently offlined children. * This prevents new mounts by disabling percpu_ref_tryget_live(). * * And don't kill the default root. */ + flush_workqueue(cgroup_destroy_wq); if (list_empty(&root->cgrp.self.children) && root != &cgrp_dfl_root && !percpu_ref_is_dying(&root->cgrp.self.refcnt)) { cgroup_bpf_offline(&root->cgrp); -- 2.35.3