[BUG] The usage of memory cgroup is not consistent with processes when using THP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi folks,
We found that the usage counter of containers with memory cgroup v1 is
not consistent with the  memory usage of processes when using THP.

It is  introduced in upstream 0a31bc97c80 patch and still exists in
Linux 5.14.5.
The root cause is that mem_cgroup_uncharge is moved to the final
put_page(). When freeing parts of huge pages in THP, the memory usage
of process is updated  when pte unmapped  and the usage counter of
memory cgroup is updated when  splitting huge pages in
deferred_split_scan. This causes the inconsistencies and we could find
more than 30GB memory difference in our daily usage.

It is reproduced with the following program and script.
The program named "eat_memory_release" allocates every 8 MB memory and
releases the last 1 MB memory using madvise.
The script "test_thp.sh" creates a memory cgroup, runs
"eat_memory_release  500" in it and loops the proceed by 10 times. The
output shows the changing of memory, which should be about 500M memory
less in theory.
The outputs are varying randomly when using THP, while adding  "echo 2
> /proc/sys/vm/drop_caches" before accounting can avoid this.

Are there any patches to fix it or is it normal by design?

Thanks,
Yunfang Tai
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>


int main(int argc, char* argv[])
{
    char* memindex[1000] = {0};
    int eat = 0;
    int wait = 0;
    int i = 0;

    if (argc < 2)  {
        printf("Usage: ./eat_release_memory <num>   #allocate num * 8 MB and free num MB memory\n");
        return;
    }

    sscanf(argv[1], "%d", &eat);
    if (eat <= 0 || eat >= 1000) {
        printf("num should larger than 0 and less than 1000\n");
        return;
    }
    printf("Allocate memory in MB size: %d\n", eat * 8);

    printf("Allocation memory Begin!\n");
    for (i = 0; i < eat; i++) {
        memindex[i] = (char*)mmap(NULL, 8*1024*1024, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
        memset(memindex[i], 0, 8*1024*1024);
    }

    printf("Allocation memory Done!\n");
    sleep(2);
    printf("Now begin to madvise free memory!\n");
    for (i = 0; i < eat; i++) {
        madvise(memindex[i] + 7*1024*1024, 1024*1024, MADV_DONTNEED);
    }
    sleep(5);
    printf("Now begin to release memory!\n");
    for (i = 0; i < eat; i++) {
        munmap(memindex[i], 8*1024*1024);
    }

}

Attachment: test_thp.sh
Description: Bourne shell script


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux