On Mon, Oct 29, 2018 at 06:22:14PM +0000, Roman Kagan <rkagan@xxxxxxxxxxxxx> wrote: > On Wed, Oct 24, 2018 at 09:48:25PM -0700, Isaku Yamahata wrote: > > This patch series implements xmm fast hypercall for hyper-v as guest > > and kvm support as VMM. > > I think it may be a good idea to do it in separate patchsets. They're > probably targeted at different maintainer trees (x86/hyperv vs kvm) and > the only thing they have in common is a couple of new defines in > hyperv-tlfs.h. > > > With this patch, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE without > > gva list, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX(vcpu > 64) and > > HVCALL_SEND_IPI_EX(vcpu > 64) can use xmm fast hypercall. > > > > benchmark result: > > At the moment, my test machine have only pcpu=4, ipi benchmark doesn't > > make any behaviour change. So for now I measured the time of > > hyperv_flush_tlb_others() by ktap with 'hardinfo -r -f text'. > > This suggests that the guest OS was Linux with your patches 1-4. What > was the hypervisor? KVM with your patch 5 or Hyper-V proper? For patch 1-4, it's hyper-v. For patch 5, it's kvm with hyper-v hypercall support. I'll split this patch series to avoid confustion. > > > the average of 5 runs are as follows. > > (When large machine with pcpu > 64 is avaialble, ipi_benchmark result is > > interesting. But not yet now.) > > Are you referring to https://patchwork.kernel.org/patch/10122703/ ? > Has it landed anywhere in the tree? I seem unable to find it... Yes, that patch. it's not merged yet. > > > hyperv_flush_tlb_others() time by hardinfo -r -f text: > > > > with path: 9931 ns > > without patch: 11111 ns > > > > > > With patch of 4bd06060762b, __send_ipi_mask() now uses fast hypercall > > when possible. so in the case of vcpu=4. So I used kernel before the parch > > to measure the effect of xmm fast hypercall with ipi_benchmark. > > The following is the average of 100 runs. > > > > ipi_benchmark: average of 100 runs without 4bd06060762b > > > > with patch: > > Dry-run 0 495181 > > Self-IPI 11352737 21549999 > > Normal IPI 499400218 575433727 > > Broadcast IPI 0 1700692010 > > Broadcast lock 0 1663001374 > > > > without patch: > > Dry-run 0 607657 > > Self-IPI 10915950 21217644 > > Normal IPI 621712609 735015570 > > This is about 122 ms difference in IPI sending time, and 160 ms in > total time, i.e. extra 38 ms for the acknowledge. AFAICS the > acknowledge path should be exactly the same. Any idea where these > additional 38 ms come from? > > > Broadcast IPI 0 2173803373 > > This one is strange, too: the difference should only be on the sending > side, and there it should be basically constant with the number of cpus. > So I would expect the patched vs unpatched delta to be about the same as > for "Normal IPI". Am I missing something? The result seems very sensitive to host activity and so is unstable. (pcpu=vcpu=4 in the benchmark.) Since the benchmark should be on large machine(vcpu>64) anyway, I didn't dig further. Thanks, > > > Broadcast lock 0 2150451543 > > Thanks, > Roman. -- Isaku Yamahata <isaku.yamahata@xxxxxxxxx>