Re: [PATCH v2 0/6] x86/kernel/hyper-v: xmm fast hypercall

Isaku Yamahata <isaku.yamahata@xxxxxxxxx> · Mon, 29 Oct 2018 19:43:19 -0700

On Mon, Oct 29, 2018 at 06:22:14PM +0000,
Roman Kagan <rkagan@xxxxxxxxxxxxx> wrote:

> On Wed, Oct 24, 2018 at 09:48:25PM -0700, Isaku Yamahata wrote:
> > This patch series implements xmm fast hypercall for hyper-v as guest
> > and kvm support as VMM.
> 
> I think it may be a good idea to do it in separate patchsets.  They're
> probably targeted at different maintainer trees (x86/hyperv vs kvm) and
> the only thing they have in common is a couple of new defines in
> hyperv-tlfs.h.  
> 
> > With this patch, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE without
> > gva list, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX(vcpu > 64) and
> > HVCALL_SEND_IPI_EX(vcpu > 64) can use xmm fast hypercall.
> > 
> > benchmark result:
> > At the moment, my test machine have only pcpu=4, ipi benchmark doesn't
> > make any behaviour change. So for now I measured the time of
> > hyperv_flush_tlb_others() by ktap with 'hardinfo -r -f text'.
> 
> This suggests that the guest OS was Linux with your patches 1-4.  What
> was the hypervisor?  KVM with your patch 5 or Hyper-V proper?

For patch 1-4, it's hyper-v.
For patch 5, it's kvm with hyper-v hypercall support.

I'll split this patch series to avoid confustion.

> 
> > the average of 5 runs are as follows.
> > (When large machine with pcpu > 64 is avaialble, ipi_benchmark result is
> > interesting. But not yet now.)
> 
> Are you referring to https://patchwork.kernel.org/patch/10122703/ ?
> Has it landed anywhere in the tree?  I seem unable to find it...

Yes, that patch. it's not merged yet.

> 
> > hyperv_flush_tlb_others() time by hardinfo -r -f text:
> > 
> > with path:       9931 ns
> > without patch:  11111 ns
> > 
> > 
> > With patch of 4bd06060762b, __send_ipi_mask() now uses fast hypercall
> > when possible. so in the case of vcpu=4. So I used kernel before the parch
> > to measure the effect of xmm fast hypercall with ipi_benchmark.
> > The following is the average of 100 runs.
> > 
> > ipi_benchmark: average of 100 runs without 4bd06060762b
> > 
> > with patch:
> > Dry-run                 0        495181
> > Self-IPI         11352737      21549999
> > Normal IPI      499400218     575433727
> > Broadcast IPI           0    1700692010
> > Broadcast lock          0    1663001374
> > 
> > without patch:
> > Dry-run                 0        607657
> > Self-IPI         10915950      21217644
> > Normal IPI      621712609     735015570
> 
> This is about 122 ms difference in IPI sending time, and 160 ms in
> total time, i.e. extra 38 ms for the acknowledge.  AFAICS the
> acknowledge path should be exactly the same.  Any idea where these
> additional 38 ms come from?
> 
> > Broadcast IPI           0    2173803373
> 
> This one is strange, too: the difference should only be on the sending
> side, and there it should be basically constant with the number of cpus.
> So I would expect the patched vs unpatched delta to be about the same as
> for "Normal IPI".  Am I missing something?

The result seems very sensitive to host activity and so is unstable.
(pcpu=vcpu=4 in the benchmark.)
Since the benchmark should be on large machine(vcpu>64) anyway,
I didn't dig further.

Thanks,

> 
> > Broadcast lock          0    2150451543
> 
> Thanks,
> Roman.

-- 
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>