Re: [PATCH v2 0/6] x86/kernel/hyper-v: xmm fast hypercall

Roman Kagan <rkagan@xxxxxxxxxxxxx> · Mon, 29 Oct 2018 18:22:14 +0000

On Wed, Oct 24, 2018 at 09:48:25PM -0700, Isaku Yamahata wrote:
> This patch series implements xmm fast hypercall for hyper-v as guest
> and kvm support as VMM.

I think it may be a good idea to do it in separate patchsets.  They're
probably targeted at different maintainer trees (x86/hyperv vs kvm) and
the only thing they have in common is a couple of new defines in
hyperv-tlfs.h.  

> With this patch, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE without
> gva list, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX(vcpu > 64) and
> HVCALL_SEND_IPI_EX(vcpu > 64) can use xmm fast hypercall.
> 
> benchmark result:
> At the moment, my test machine have only pcpu=4, ipi benchmark doesn't
> make any behaviour change. So for now I measured the time of
> hyperv_flush_tlb_others() by ktap with 'hardinfo -r -f text'.

This suggests that the guest OS was Linux with your patches 1-4.  What
was the hypervisor?  KVM with your patch 5 or Hyper-V proper?

> the average of 5 runs are as follows.
> (When large machine with pcpu > 64 is avaialble, ipi_benchmark result is
> interesting. But not yet now.)

Are you referring to https://patchwork.kernel.org/patch/10122703/ ?
Has it landed anywhere in the tree?  I seem unable to find it...

> hyperv_flush_tlb_others() time by hardinfo -r -f text:
> 
> with path:       9931 ns
> without patch:  11111 ns
> 
> 
> With patch of 4bd06060762b, __send_ipi_mask() now uses fast hypercall
> when possible. so in the case of vcpu=4. So I used kernel before the parch
> to measure the effect of xmm fast hypercall with ipi_benchmark.
> The following is the average of 100 runs.
> 
> ipi_benchmark: average of 100 runs without 4bd06060762b
> 
> with patch:
> Dry-run                 0        495181
> Self-IPI         11352737      21549999
> Normal IPI      499400218     575433727
> Broadcast IPI           0    1700692010
> Broadcast lock          0    1663001374
> 
> without patch:
> Dry-run                 0        607657
> Self-IPI         10915950      21217644
> Normal IPI      621712609     735015570

This is about 122 ms difference in IPI sending time, and 160 ms in
total time, i.e. extra 38 ms for the acknowledge.  AFAICS the
acknowledge path should be exactly the same.  Any idea where these
additional 38 ms come from?

> Broadcast IPI           0    2173803373

This one is strange, too: the difference should only be on the sending
side, and there it should be basically constant with the number of cpus.
So I would expect the patched vs unpatched delta to be about the same as
for "Normal IPI".  Am I missing something?

> Broadcast lock          0    2150451543

Thanks,
Roman.