Re: [RFC] KVM: arm/arm64: optimize vSGI injection performance

zhaoxu <zhaoxu.35@xxxxxxxxxxxxx> · Wed, 23 Aug 2023 11:19:23 +0800

On 2023/8/22 16:28, Marc Zyngier wrote:
On Tue, 22 Aug 2023 04:51:30 +0100,
zhaoxu <zhaoxu.35@xxxxxxxxxxxxx> wrote:
In fact, the core vCPU search algorithm remains the same in the latest
kernel: iterate all vCPUs, if mpidr matches, inject. next version will
based on latest kernel.

My point is that performance numbers on such an ancient kernel hardly
make any sense, as a large portion of the code will be different. We
aim to live in the future, not in the past.

Yes, i got it, thanks.

- which current guest OS *currently* make use of broadcast or 1:N
    SGIs? Linux doesn't and overall SGI multicasting is pretty useless
    to an OS.

[...]
Yes, arm64 linux almost never send broadcast ipi. I will use another
test data to prove performence improvement

Exactly. I also contend that *no* operating system uses broadcast (or
even multicast) signalling, because this is a very pointless
operation.

So what are you optimising for?

Explanation at the end.

   /*
- * Compare a given affinity (level 1-3 and a level 0 mask, from the SGI
- * generation register ICC_SGI1R_EL1) with a given VCPU.
- * If the VCPU's MPIDR matches, return the level0 affinity, otherwise
- * return -1.
+ * Get affinity routing index from ICC_SGI_* register
+ * format:
+ *     aff3       aff2       aff1            aff0
+ * |- 8 bits -|- 8 bits -|- 8 bits -|- 4 bits or 8bits -|

OK, so you are implementing RSS support:

- Why isn't that mentioned anywhere in the commit log?

- Given that KVM actively limits the MPIDR to 4 bits at Aff0, how does
    it even work the first place?

- How is that advertised to the guest?

- How can the guest enable RSS support?

thanks to mention that, I also checked the relevant code, guest can't
enable RSS, it was my oversight. This part has removed in next
version.

Then what's the point of your patch? You don't explain anything, which
makes it very hard to guess what you're aiming for.
This patch aims to optimize the vCPU search algorithm when injecting vSGI.

For example, in a 64-core VM, the CPU topology consists of 4 aff0 groups 
(0-15, 16-31, 32-47, 48-63). When the guest wants to send a SGI to core 
63, in the previous logic, kvm needs to iterate over all vCPUs to 
identify core 63 using the kvm_for_each_vcpu function, and then inject 
the vSGI into it. However, the ICC_SGI* register provides affinity 
routing information, enabling us to bypass the initial three aff0 
groups, starting with the last one. As a result, the iteration times 
will reduced from the number of vCPUs (64 in this case) to 16 or 8 
times(Using a mask to determine the distribution of a target list in 
ICC_SGI* register).

This optimization effect is evident under the following conditions: 1. A 
VM with more than 16 cores. 2. The inject target vCPU is located after 
the 16th core. Therefore, this patch must ensure that the performance 
will not deteriorate when the inject target is aff0 group (core 0-15), 
that’s the reason why I put these test data in the patch.

       M.

	Xu.