On Wed, Oct 24, 2018 at 09:48:25PM -0700, Isaku Yamahata wrote: > This patch series implements xmm fast hypercall for hyper-v as guest > and kvm support as VMM. I think it may be a good idea to do it in separate patchsets. They're probably targeted at different maintainer trees (x86/hyperv vs kvm) and the only thing they have in common is a couple of new defines in hyperv-tlfs.h. > With this patch, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE without > gva list, HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX(vcpu > 64) and > HVCALL_SEND_IPI_EX(vcpu > 64) can use xmm fast hypercall. > > benchmark result: > At the moment, my test machine have only pcpu=4, ipi benchmark doesn't > make any behaviour change. So for now I measured the time of > hyperv_flush_tlb_others() by ktap with 'hardinfo -r -f text'. This suggests that the guest OS was Linux with your patches 1-4. What was the hypervisor? KVM with your patch 5 or Hyper-V proper? > the average of 5 runs are as follows. > (When large machine with pcpu > 64 is avaialble, ipi_benchmark result is > interesting. But not yet now.) Are you referring to https://patchwork.kernel.org/patch/10122703/ ? Has it landed anywhere in the tree? I seem unable to find it... > hyperv_flush_tlb_others() time by hardinfo -r -f text: > > with path: 9931 ns > without patch: 11111 ns > > > With patch of 4bd06060762b, __send_ipi_mask() now uses fast hypercall > when possible. so in the case of vcpu=4. So I used kernel before the parch > to measure the effect of xmm fast hypercall with ipi_benchmark. > The following is the average of 100 runs. > > ipi_benchmark: average of 100 runs without 4bd06060762b > > with patch: > Dry-run 0 495181 > Self-IPI 11352737 21549999 > Normal IPI 499400218 575433727 > Broadcast IPI 0 1700692010 > Broadcast lock 0 1663001374 > > without patch: > Dry-run 0 607657 > Self-IPI 10915950 21217644 > Normal IPI 621712609 735015570 This is about 122 ms difference in IPI sending time, and 160 ms in total time, i.e. extra 38 ms for the acknowledge. AFAICS the acknowledge path should be exactly the same. Any idea where these additional 38 ms come from? > Broadcast IPI 0 2173803373 This one is strange, too: the difference should only be on the sending side, and there it should be basically constant with the number of cpus. So I would expect the patched vs unpatched delta to be about the same as for "Normal IPI". Am I missing something? > Broadcast lock 0 2150451543 Thanks, Roman.