Re: [Question] performance regression after VM migration due to anon THP split in CoW

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




在 2024/6/29 17:45, David Hildenbrand 写道:
Hi,

Likely the mailing lists won‘t like my mail from this Google Mail client ;)

Jinjiang Tu <tujinjiang@xxxxxxxxxx> schrieb am Sa. 29. Juni 2024 um 11:18:

    Hi,

    We noticed a performance regression in benchmark memtester[1] after
    upgrading the kernel. THP is enabled by default
    (/sys/kernel/mm/transparent_hugepage/enabled
    is set to "always"). The issue arises when we migrate a virtual
    machine
    that has 125G total memory and 124G free memory to another host.
    And then,
    we run the command `memtester 120G` in the VM. The benchmark takes
    about
    20 seconds to consume 120G memory in v4.18, but takes about 160
    seconds in
    v5.10. This issue exists in mainline kernel too.


Simple: use preallocation in QEMU. „prealloc=on“ for host memory backends, for example.


    We find commit 3917c80280c9 ("thp: change CoW semantics for anon-THP")
    leads to the performance regression. Since this commit, When we
    trigger a
    write fault on a anon THP, we split the PMD and allocate a 4K
    page, instead
    of allocating the full anon THP. When a VM is migrating (based on
    qemu[2]),
    if the page is marked zero page in the source VM, the destination
    VM will
    call mmap and read the region to allocate memory, making the
    region mapped
    by the zero THP. When we run memtester in the destination VM after VM
    migration finishes, memtester(in VM) will allocate large amounts
    of free
    memory and write to them, cause CoW of anon THP and THP split, further
    cause performance regression. After reverting this commit, performance
    regression disappears.


You talk about COW of anon THP, whereby your scenario really only relied on COW of the huge zeropage.

Wouldn’t you would get a similar result when disabling the huge zeropage?



    This commit optimises some scenarios such as Redis, but may lead to
    performance regression in some other scenarios, such as VM migration.
    How could we solve this issue? Maybe we could add a new sysctl to
    let users
    decide whether to CoW the full anon THP or not?


I‘m not convinced the use case you present really warrants a toggle for that. In your case you only want to change semantics on COW fault to the huge zeropage. But …

Using preallocation in QEMU will give you all anon THP right from the start, avoiding any cow. Sure, you consume all memory right away, but after all that‘s what your use case triggers either way. And it might all be even faster. :)

Cheers!

Thanks for reply. The two methods both work. But they both lead to large memory consumption even though the VM doesn't need so much memory right now.


    Thanks.

    [1] https://github.com/jnavila/memtester/tree/master
    [2] https://github.com/qemu/qemu/blob/master/migration/ram.c






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux