On Tue, 12 Nov 2019 at 09:33, Wanpeng Li <kernellwp@xxxxxxxxx> wrote: > > On Tue, 12 Nov 2019 at 05:59, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > > > On 09/11/19 08:05, Wanpeng Li wrote: > > > From: Wanpeng Li <wanpengli@xxxxxxxxxxx> > > > > > > This patch tries to optimize x2apic physical destination mode, fixed delivery > > > mode single target IPI by delivering IPI to receiver immediately after sender > > > writes ICR vmexit to avoid various checks when possible. > > > > > > Testing on Xeon Skylake server: > > > > > > The virtual IPI latency from sender send to receiver receive reduces more than > > > 330+ cpu cycles. > > > > > > Running hackbench(reschedule ipi) in the guest, the avg handle time of MSR_WRITE > > > caused vmexit reduces more than 1000+ cpu cycles: > > > > > > Before patch: > > > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time > > > MSR_WRITE 5417390 90.01% 16.31% 0.69us 159.60us 1.08us > > > > > > After patch: > > > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time > > > MSR_WRITE 6726109 90.73% 62.18% 0.48us 191.27us 0.58us > > > > Do you have retpolines enabled? The bulk of the speedup might come just > > from the indirect jump. > > Adding 'mitigations=off' to the host grub parameter: > > Before patch: > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time > MSR_WRITE 2681713 92.98% 77.52% 0.38us 18.54us > 0.73us ( +- 0.02% ) > > After patch: > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time > MSR_WRITE 2953447 92.48% 62.47% 0.30us 59.09us > 0.40us ( +- 0.02% ) Hmm, sender side less vmexit time is due to kvm_exit tracepoint is still left in vmx_handle_exit, and ICR wrmsr is moved ahead, that is why the time between kvm_exit tracepoint and kvm_entry tracepoint is reduced. But the virtual IPI latency still can reduce 330+ cycles. Wanpeng