On Tue, Feb 18, 2020 at 01:13:47PM +0000, Zhoujian (jay) wrote: > Hi all, > > We found that the guest will be soft-lockup occasionally when live migrating a 60 vCPU, > 512GiB huge page and memory sensitive VM. The reason is clear, almost all of the vCPUs > are waiting for the KVM MMU spin-lock to create 4K SPTEs when the huge pages are > write protected. This phenomenon is also described in this patch set: > https://patchwork.kernel.org/cover/11163459/ > which aims to handle page faults in parallel more efficiently. > > Our idea is to use the migration thread to touch all of the guest memory in the > granularity of 4K before enabling dirty logging. To be more specific, we split all the > PDPE_LEVEL SPTEs into DIRECTORY_LEVEL SPTEs as the first step, and then split all > the DIRECTORY_LEVEL SPTEs into PAGE_TABLE_LEVEL SPTEs as the following step. IIUC, QEMU will prefer to use huge pages for all the anonymous ramblocks (please refer to ram_block_add): qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_HUGEPAGE); Another alternative I can think of is to add an extra parameter to QEMU to explicitly disable huge pages (so that can even be MADV_NOHUGEPAGE instead of MADV_HUGEPAGE). However that should also drag down the performance for the whole lifecycle of the VM. A 3rd option is to make a QMP command to dynamically turn huge pages on/off for ramblocks globally. Haven't thought deep into any of them, but seems doable. Thanks, -- Peter Xu