Hello, On Tue, May 31, 2022 at 11:49:53AM +0800, Hongchen Zhang wrote: > Yes, the problem would disappear when add some reasonable delay. But I think It'd be better to wait for some group of operations to complete than inserting explicit delays. > if we can increase the MEM_CGROUP_ID_MAX to INT_MAX.Thus the -ENOMEM error > would be never occured,even if the system is out of memory. Oh, you're hitting the memcg ID limit, not the css one. Memcg id is limited so that it doesn't consume as many bits in, I guess, struct page. I don't think it'd make sense to increase overall overhead to solve this rather artificial problem tho. Maybe just keep the sequence numbers for started and completed offline operations and wait for completed# to reach the started# on memcg alloc failure and retry? Note that we can get live locked, so have to remember the sequence number to wait for at the beginning. Or, even simpler, maybe it'd be enough to just do synchronize_rcu() and then wait for the offline wait once and retry. Thanks. -- tejun