Latency issues inside KVM.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

We found some latency issue in high-density and high-concurrency scenarios, we
are using cloud hypervisor as vmm for lightweight VM, using VIRTIO net and
block for VM. In our test, we got about 50ms to 100ms+ latency in creating VM
and register irqfd, after trace with funclatency (a tool of bcc-tools,
https://github.com/iovisor/bcc), we found the latency introduced by following
functions:

- irq_bypass_register_consumer introduce more than 60ms per VM.
  This function was called when registering irqfd, the function will register
  irqfd as consumer to irqbypass, wait for connecting from irqbypass producers,
  like VFIO or VDPA. In our test, one irqfd register will get about 4ms
  latency, and 5 devices with total 16 irqfd will introduce more than 60ms
  latency.

- kvm_vm_create_worker_thread introduce tail latency more than 100ms.
  This function was called when create "kvm-nx-lpage-recovery" kthread when
  create a new VM, this patch was introduced to recovery large page to relief
  performance loss caused by software mitigation of ITLB_MULTIHIT, see
  b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") and 1aa9b9572b10
  ("kvm: x86: mmu: Recovery of shattered NX large pages").

Here is a simple case, which can emulate the latency issue (the real latency
is lager). The case create 800 VM as background do nothing, then repeatedly
create 20 VM then destroy them after 400ms, every VM will do simple thing,
create in kernel irq chip, and register 15 riqfd (emulate 5 devices and every
device has 3 irqfd), just trace the two function latency, you will reproduce
such kind latency issue. Here is a trace log on Xeon(R) Platinum 8255C server
(96C, 2 sockets) with linux 6.2.20.

Reproduce Case
https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/kvm_irqfd_fork.c
Reproduce log
https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/test.log

To fix these latencies, I didn't have a graceful method, just simple ideas
is give user a chance to avoid these latencies, like a module parameter to
disable "kvm-nx-lpage-recovery" kthread and new flag to disable irqbypass
for each irqfd.

Any suggestion to fix the issue if welcomed.

Thanks!

-- 
——————————
   zhuangel570
——————————




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux