On Fri, 26 Jul 2019 13:35:20 +0100, Xiangyou Xie <xiexiangyou@xxxxxxxxxx> wrote: > > Hi Marc, > > Sorry, the test data was not posted in the previous email. > > I tested the impact of virtual interrupt injection time-consuming: > Run the iperf command to send UDP packets to the VM: > iperf -c $IP -u -b 40m -l 64 -t 6000& > The vm just receive UDP traffic. When configure multiple NICs, each > NIC receives the above iperf UDP traffic, This may reflect the > performance impact of shared resource competition, such as lock. > > Observing the delay of virtual interrupt injection: the time spent by > the "irqfd_wakeup", "irqfd_inject" function, and kworker context > switch. > The less the better. > > ITS translation cache greatly reduces the delay of interrupt injection > compared to kworker thread, because it eliminate wakeup and uncertain > scheduling delay: > kworker ITS translation cache improved > 1 NIC 6.692 us 1.766 us > 73.6% > 10 NICs 7.536 us 2.574 us > 65.8% OK, that's pretty interesting. It would have been good to post such data in reply to the ITS cache series. > > Increases the number of lpi_translation_cache reduce lock competition. > Multi-interrupt concurrent injections perform better: > > ITS translation cache with patch improved > 1 NIC 1.766 us 1.694 us 4.1% > 10 NICs 2.574 us 1.848 us 28.2% That's sort off interesting too, but it doesn't answer any of the questions I had in response to your patch: How do you ensure mutual exclusion with the LPI list being concurrently updated? How does the new locking fit in the current locking scheme? Thanks, M. > Regards, > > Xiangyou > > On 2019/7/24 19:09, Marc Zyngier wrote: > > Hi Xiangyou, > > > > On 24/07/2019 10:04, Xiangyou Xie wrote: > >> Because dist->lpi_list_lock is a perVM lock, when a virtual machine > >> is configured with multiple virtual NIC devices and receives > >> network packets at the same time, dist->lpi_list_lock will become > >> a performance bottleneck. > > > > I'm sorry, but you'll have to show me some real numbers before I > > consider any of this. There is a reason why the original series still > > isn't in mainline, and that's because people don't post any numbers. > > Adding more patches is not going to change that small fact. > > > >> This patch increases the number of lpi_translation_cache to eight, > >> hashes the cpuid that executes irqfd_wakeup, and chooses which > >> lpi_translation_cache to use. > > > > So you've now switched to a per-cache lock, meaning that the rest of the > > ITS code can manipulate the the lpi_list without synchronization with > > the caches. Have you worked out all the possible races? Also, how does > > this new lock class fits in the whole locking hierarchy? > > > > If you want something that is actually scalable, do it the right way. > > Use a better data structure than a list, switch to using RCU rather than > > the current locking strategy. But your current approach looks quite fragile. > > > > Thanks, > > > > M. > > > -- Jazz is not dead, it just smells funny.