Hi Oliver, On Wed, 24 Jan 2024 20:48:54 +0000, Oliver Upton <oliver.upton@xxxxxxxxx> wrote: > > The unfortunate reality is there are increasingly large systems that are > shipping today without support for GICv4 vLPI injection. Serialization > in KVM's LPI routing/injection code has been a significant bottleneck > for VMs on these machines when under a high load of LPIs (e.g. a > multi-queue NIC). > > Even though the long-term solution is quite clearly **direct > injection**, we really ought to do something about the LPI scaling > issues within KVM. > > This series aims to improve the performance of LPI routing/injection in > KVM by moving readers of LPI configuration data away from the > lpi_list_lock in favor or using RCU. > > Patches 1-5 change out the representation of LPIs in KVM from a > linked-list to an xarray. While not strictly necessary for making the > locking improvements, this seems to be an opportune time to switch to a > data structure that can actually be indexed. > > Patches 6-10 transition vgic_get_lpi() and vgic_put_lpi() away from > taking the lpi_list_lock in favor of using RCU for protection. Note that > this requires some rework to the way references are taken on LPIs and > how reclaim works to be RCU safe. > > Lastly, patches 11-15 rework the LRU policy on the LPI translation cache > to not require moving elements in the linked-list and take advantage of > this to make it an rculist readable outside of the lpi_list_lock. I quite like the overall direction. I've left a few comments here and there, and will probably get back to it after I try to run some tests on a big-ish box. > All of this was tested on top of v6.8-rc1. Apologies if any of the > changelogs are a bit too light, I'm happy to rework those further in > subsequent revisions. > > I would've liked to have benchmark data showing the improvement on top > of upstream with this series, but I'm currently having issues with our > internal infrastructure and upstream kernels. However, this series has > been found to have a near 2x performance improvement to redis-memtier [*] > benchmarks on our kernel tree. It'd be really good to have upstream-based numbers, with details of the actual setup (device assignment? virtio?) so that we can compare things and even track regressions in the future. Thanks, M. -- Without deviation from the norm, progress is not possible.