Re: [PATCH 1/2] KVM: X86: Single target IPI fastpath

Wanpeng Li <kernellwp@xxxxxxxxx> · Tue, 12 Nov 2019 09:33:49 +0800

On Tue, 12 Nov 2019 at 05:59, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>
> On 09/11/19 08:05, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@xxxxxxxxxxx>
> >
> > This patch tries to optimize x2apic physical destination mode, fixed delivery
> > mode single target IPI by delivering IPI to receiver immediately after sender
> > writes ICR vmexit to avoid various checks when possible.
> >
> > Testing on Xeon Skylake server:
> >
> > The virtual IPI latency from sender send to receiver receive reduces more than
> > 330+ cpu cycles.
> >
> > Running hackbench(reschedule ipi) in the guest, the avg handle time of MSR_WRITE
> > caused vmexit reduces more than 1000+ cpu cycles:
> >
> > Before patch:
> >
> >   VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
> > MSR_WRITE    5417390    90.01%    16.31%      0.69us    159.60us    1.08us
> >
> > After patch:
> >
> >   VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
> > MSR_WRITE    6726109    90.73%    62.18%      0.48us    191.27us    0.58us
>
> Do you have retpolines enabled?  The bulk of the speedup might come just
> from the indirect jump.

Adding 'mitigations=off' to the host grub parameter:

Before patch:

    VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
MSR_WRITE    2681713    92.98%    77.52%      0.38us     18.54us
0.73us ( +-   0.02% )

After patch:

    VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
MSR_WRITE    2953447    92.48%    62.47%      0.30us     59.09us
0.40us ( +-   0.02% )

Actually, this is not the first attempt to add shortcut for MSR writes
which performance sensitive, the other effort is tscdeadline timer
from Isaku Yamahata, https://patchwork.kernel.org/cover/10541035/ ,
ICR and TSCDEADLINE MSR writes cause the main MSR write vmexits in our
product observation, multicast IPIs are not as common as unicast IPI
like RESCHEDULE_VECTOR and CALL_FUNCTION_SINGLE_VECTOR etc. As far as
I know, something similar to this patch has already been deployed in
some cloud companies private kvm fork.

    Wanpeng