https://bugzilla.kernel.org/show_bug.cgi?id=217380 Bug ID: 217380 Summary: Latency issues in creating kvm-nx-lpage-recovery kthread Product: Virtualization Version: unspecified Hardware: All OS: Linux Status: NEW Severity: normal Priority: P3 Component: kvm Assignee: virtualization_kvm@xxxxxxxxxxxxxxxxxxxx Reporter: zhuangel570@xxxxxxxxx Regression: No Hi We found some latency issue in high-density and high-concurrency scenarios, we are using cloud hypervisor as vmm for lightweight VM, using VIRTIO net and block for VM. In our test, we got about 50ms to 100ms+ latency in creating VM and register irqfd, after trace with funclatency (a tool of bcc-tools, https://github.com/iovisor/bcc), we found the latency introduced by following functions: - kvm_vm_create_worker_thread introduce tail latency more than 100ms. This function was called when create "kvm-nx-lpage-recovery" kthread when create a new VM, this patch was introduced to recovery large page to relief performance loss caused by software mitigation of ITLB_MULTIHIT, see b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") and 1aa9b9572b10 ("kvm: x86: mmu: Recovery of shattered NX large pages"). Here is a simple case, which can emulate the latency issue (the real latency is lager). The case create 800 VM as background do nothing, then repeatedly create 20 VM then destroy them after 400ms, just trace the two function latency, you will reproduce such kind latency issue. Here is a trace log on Xeon(R) Platinum 8255C server (96C, 2 sockets) with linux 6.2.20. Reproduce Case https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/kvm_irqfd_fork.c Reproduce log https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/test.log To fix these latencies, I didn't have a graceful method, just simple ideas is give user a chance to avoid these latencies, like a module parameter to disable "kvm-nx-lpage-recovery" kthread. Any suggestion to fix the issue if welcomed. Thanks! -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.