Re: [PATCH] cgroup: wait for css offline when rmdir

Tejun Heo <tj@xxxxxxxxxx> · Tue, 31 May 2022 07:19:31 -1000

Hello,

On Tue, May 31, 2022 at 11:49:53AM +0800, Hongchen Zhang wrote:
> Yes, the problem would disappear when add some reasonable delay. But I think

It'd be better to wait for some group of operations to complete than
inserting explicit delays.

> if we can increase the MEM_CGROUP_ID_MAX to INT_MAX.Thus the -ENOMEM error
> would be never occured,even if the system is out of memory.

Oh, you're hitting the memcg ID limit, not the css one. Memcg id is limited
so that it doesn't consume as many bits in, I guess, struct page. I don't
think it'd make sense to increase overall overhead to solve this rather
artificial problem tho.

Maybe just keep the sequence numbers for started and completed offline
operations and wait for completed# to reach the started# on memcg alloc
failure and retry? Note that we can get live locked, so have to remember the
sequence number to wait for at the beginning. Or, even simpler, maybe it'd
be enough to just do synchronize_rcu() and then wait for the offline wait
once and retry.

Thanks.

-- 
tejun