Re: memory cgroup pagecache and inode problem

Fam Zheng <zhengfeiran@xxxxxxxxxxxxx> · Sun, 6 Jan 2019 10:08:52 +0800

> On Jan 6, 2019, at 05:09, Roman Gushchin <guro@xxxxxx> wrote:
> 
> On Fri, Jan 04, 2019 at 12:43:40PM +0800, Fam Zheng wrote:
>> Hi,
>> 
>> In our server which frequently spawns containers, we find that if a process used pagecache in memory cgroup, after the process exits and memory cgroup is offlined, because the pagecache is still charged in this memory cgroup, this memory cgroup will not be destroyed until the pagecaches are dropped. This brings huge memory stress over time. We find that over one hundred thounsand such offlined memory cgroup in system hold too much memory (~100G). This memory can not be released immediately even after all associated pagecahes are released, because those memory cgroups are destroy asynchronously by a kworker. In some cases this can cause oom, since the synchronous memory allocation failed.
>> 
>> We think a fix is to create a kworker that scans all pagecaches and dentry caches etc. in the background, if a referenced memory cgroup is offline, try to drop the cache or move it to the parent cgroup. This kworker can wake up periodically, or upon memory cgroup offline event (or both).
>> 
>> There is a similar problem in inode. After digging in ext4 code, we find that when creating inode cache, SLAB_ACCOUNT is used. In this case, inode will alloc in slab which belongs to the current memory cgroup. After this memory cgroup goes offline, this inode may be held by a dentry cache. If another process uses the same file. this inode will be held by that process, preventing the previous memory cgroup from being destroyed until this other process closes the file and drops the dentry cache.
>> 
>> We still don't have a reasonable way to fix this.
>> 
>> Ideas?
> 
> Hi, Fam!

Hi!

> 
> Which kernel version you're on?

We’ve seen the issue in a range of versions from 4.4 to 4.19.

> 
> I made some changes recently to fix a memcg "leak", or better to say, make
> memcg reclaim possible under normal conditions. Before that we were accumulating
> a big number of dying cgroups, which matches your description.

Is there a commit id?

Fam

> 
> 100000 dying cgroups sounds scary, it shouldn't be that way, if there is any
> memory pressure.
> 
> Thanks!