On Fri, Apr 28, 2023, bugzilla-daemon@xxxxxxxxxx wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=217379 > > Bug ID: 217379 > Summary: Latency issues in irq_bypass_register_consumer > Product: Virtualization > Version: unspecified > Hardware: Intel > OS: Linux > Status: NEW > Severity: normal > Priority: P3 > Component: kvm > Assignee: virtualization_kvm@xxxxxxxxxxxxxxxxxxxx > Reporter: zhuangel570@xxxxxxxxx > Regression: No > > We found some latency issue in high-density and high-concurrency scenarios, > we are using cloud hypervisor as vmm for lightweight VM, using VIRTIO net and > block for VM. In our test, we got about 50ms to 100ms+ latency in creating VM > and register irqfd, after trace with funclatency (a tool of bcc-tools, > https://github.com/iovisor/bcc), we found the latency introduced by following > functions: > > - irq_bypass_register_consumer introduce more than 60ms per VM. > This function was called when registering irqfd, the function will register > irqfd as consumer to irqbypass, wait for connecting from irqbypass producers, > like VFIO or VDPA. In our test, one irqfd register will get about 4ms > latency, and 5 devices with total 16 irqfd will introduce more than 60ms > latency. > > Here is a simple case, which can emulate the latency issue (the real latency > is lager). The case create 800 VM as background do nothing, then repeatedly > create 20 VM then destroy them after 400ms, every VM will do simple thing, > create in kernel irq chip, and register 15 riqfd (emulate 5 devices and every > device has 3 irqfd), just trace the "irq_bypass_register_consumer" latency, you > will reproduce such kind latency issue. Here is a trace log on Xeon(R) Platinum > 8255C server (96C, 2 sockets) with linux 6.2.20. > > Reproduce Case > https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/kvm_irqfd_fork.c > Reproduce log > https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/test.log > > To fix these latencies, I didn't have a graceful method, just simple ideas > is give user a chance to avoid these latencies, like new flag to disable > irqbypass for each irqfd. > > Any suggestion to fix the issue if welcomed. Looking at the code, it's not surprising that irq_bypass_register_consumer() can exhibit high latencies. The producers and consumers are stored in simple linked lists, and a single mutex is held while traversing the lists *and* connecting a consumer to a producer (and vice versa). There are two obvious optimizations that can be done to reduce latency in irq_bypass_register_consumer(): - Use a different data type to track the producers and consumers so that lookups don't require a linear walk. AIUI, the "tokens" used to match producers and consumers are just kernel pointers, so I _think_ XArray would perform reasonably well. - Connect producers and consumers outside of a global mutex. Unfortunately, because .add_producer() and .add_consumer() can fail, and because connections can be established by adding a consumer _or_ a producer, getting the locking right without a global mutex is quite difficult. It's certainly doable to move the (dis)connect logic out of a global lock, but it's going to require a dedicated effort, i.e. not something that can be sketched out in a few minutes (I played around with the code for the better part of an hour trying to do just that and kept running into edge case race conditions).