Hi all, We found that the guest will be soft-lockup occasionally when live migrating a 60 vCPU, 512GiB huge page and memory sensitive VM. The reason is clear, almost all of the vCPUs are waiting for the KVM MMU spin-lock to create 4K SPTEs when the huge pages are write protected. This phenomenon is also described in this patch set: https://patchwork.kernel.org/cover/11163459/ which aims to handle page faults in parallel more efficiently. Our idea is to use the migration thread to touch all of the guest memory in the granularity of 4K before enabling dirty logging. To be more specific, we split all the PDPE_LEVEL SPTEs into DIRECTORY_LEVEL SPTEs as the first step, and then split all the DIRECTORY_LEVEL SPTEs into PAGE_TABLE_LEVEL SPTEs as the following step. However, there is a side effect. It takes more time to clear the D-bits of the last level SPTEs when enabling dirty logging, which is held the QEMU BQL and KVM mmu-lock simultaneously. To solve this issue, the idea of dirty logging gradually in small chunks is proposed too, here is the link for v1: https://patchwork.kernel.org/patch/11388227/ Under the Intel(R) Xeon(R) Gold 6161 CPU @ 2.20GHz environment, some tests has been done with a 60U256G VM which enables numa balancing using the demo we written. We start a process which has 60 threads to randomly touch most of the memory in VM, meanwhile count the function execution time in VM when live migration. The change_prot_numa() is chosen since it will not release the CPU unless its work has finished. Here is the number: Original The demo we written [1] > 9s (most of the time) ~5ms Hypervisor cost > 90% ~3% [1]: execution time of the change_prot_numa() function If the time in [1] bigger than 20s, it will be result in soft-lockup. I know it is a little hacking to do so, but my question is: is this worth trying to split EPT huge pages in advance of dirty logging? Any advice will be appreciated, thanks. Regards, Jay Zhou