[Question]: pagecache thrashing and hard to trigger OOM in cgroup

Liu Shixin <liushixin2@xxxxxxxxxx> · Wed, 22 Nov 2023 11:26:07 +0800



    Hi everyone,

    
    Recently, we meet an IO performance issue which caused by pagecache
    thrashing in 

    a cgroup and we found it is introduced by commit 815744d75152 ("mm: 
    memcontrol:

    don't batch updates of local VM stats and events").

    
    The problem can easily reproduced in docker environment.
    Firstly,create a container

    with 4G memory limit and 2G swap limit, then run a program which
    allocate (6G - 50M)

    anon memory so there are only 50M memory can be used and no swap
    space. Then

    do "yum install gcc" and we can observed that the yum program is
    thrashing and IO

    keep high for a long but didn't trigger oom. This affects other
    processes or containers

    in the machine.

    
    After analysis, we found there are large number of readahead
    failures during this time.

    Since page allocation from pagecache readahead have __GFP_NORETRY
    flag, the oom

    will be skipped when reach memcg limit. The pagecache is repeatedly
    allocated and

    reclaimed, and the value of workset_refault_file is high. These
    readahead take a lot of

    time, which consume a lot of IO throughput and impact the entire
    system. This keeps

    for long times until other page allocation trigger oom.

    
    By bisection, we finally found
    commit 815744d75152("mm:  memcontrol: don't batch

    updates of local VM stats and events"). Before the commit, the
    process will trigger oom

    in very short time. We suspect the difference is caused by
    performance changes.

    
    Is there any good way to fix the problem? we prefer the process to
    be oom rather

    than cause the system to be hung and affect other processes.

    
    Thanks,