Re: RFC: Split EPT huge pages in advance of dirty logging

Peter Xu <peterx@xxxxxxxxxx> · Tue, 18 Feb 2020 12:43:11 -0500

On Tue, Feb 18, 2020 at 01:13:47PM +0000, Zhoujian (jay) wrote:
> Hi all,
> 
> We found that the guest will be soft-lockup occasionally when live migrating a 60 vCPU,
> 512GiB huge page and memory sensitive VM. The reason is clear, almost all of the vCPUs
> are waiting for the KVM MMU spin-lock to create 4K SPTEs when the huge pages are
> write protected. This phenomenon is also described in this patch set:
> https://patchwork.kernel.org/cover/11163459/
> which aims to handle page faults in parallel more efficiently.
> 
> Our idea is to use the migration thread to touch all of the guest memory in the
> granularity of 4K before enabling dirty logging. To be more specific, we split all the
> PDPE_LEVEL SPTEs into DIRECTORY_LEVEL SPTEs as the first step, and then split all
> the DIRECTORY_LEVEL SPTEs into PAGE_TABLE_LEVEL SPTEs as the following step.

IIUC, QEMU will prefer to use huge pages for all the anonymous
ramblocks (please refer to ram_block_add):

        qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_HUGEPAGE);

Another alternative I can think of is to add an extra parameter to
QEMU to explicitly disable huge pages (so that can even be
MADV_NOHUGEPAGE instead of MADV_HUGEPAGE).  However that should also
drag down the performance for the whole lifecycle of the VM.  A 3rd
option is to make a QMP command to dynamically turn huge pages on/off
for ramblocks globally.  Haven't thought deep into any of them, but
seems doable.

Thanks,

-- 
Peter Xu