> On Apr 23, 2019, at 04:54, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > On Mon, Apr 22, 2019 at 09:24:59PM +0800, zhenwei pi wrote: >> Hit several typical cases of performance drop due to vm-exit: >> case 1, jemalloc calls madvise(void *addr, size_t length, MADV_DONTNEED) in >> guest, IPI causes a lot of exits. >> case 2, atop collects IPC by perf hardware events in guest, vpmu & rdpmc >> exits increase a lot. >> case 3, host memory compaction invalidates TDP and tdp_page_fault can cause >> huge loss of performance. >> case 4, web services(written by golang) call futex and have higher latency >> than host os environment. >> >> Add more vm-exit reason debug entries, they are helpful to recognize >> performance drop reason. In this patch: >> 1, add more vm-exit reasons. >> 2, add wrmsr details. >> 3, add CR details. >> 4, add hypercall details. >> >> Currently we can also implement the same result by bpf. >> Sample code (written by Fei Li<lifei.shirley@xxxxxxxxxxxxx>): >> b = BPF(text=""" >> >> struct kvm_msr_exit_info { >> u32 pid; >> u32 tgid; >> u32 msr_exit_ct; >> }; >> BPF_HASH(kvm_msr_exit, unsigned int, struct kvm_msr_exit_info, 1024); >> >> TRACEPOINT_PROBE(kvm, kvm_msr) { >> int ct = args->ecx; >> if (ct >= 0xffffffff) { >> return -1; >> } >> >> u32 pid = bpf_get_current_pid_tgid() >> 32; >> u32 tgid = bpf_get_current_pid_tgid(); >> >> struct kvm_msr_exit_info *exit_info; >> struct kvm_msr_exit_info init_exit_info = {}; >> exit_info = kvm_msr_exit.lookup(&ct); >> if (exit_info != NULL) { >> exit_info->pid = pid; >> exit_info->tgid = tgid; >> exit_info->msr_exit_ct++; >> } else { >> init_exit_info.pid = pid; >> init_exit_info.tgid = tgid; >> init_exit_info.msr_exit_ct = 1; >> kvm_msr_exit.update(&ct, &init_exit_info); >> } >> return 0; >> } >> """) >> >> Run wrmsr(MSR_IA32_TSCDEADLINE, val) benchmark in guest >> (CPU Intel Gold 5118): >> case 1, no bpf on host. ~1127 cycles/wrmsr. >> case 2, sample bpf on host with JIT. ~1223 cycles/wrmsr. --> 8.5% >> case 3, sample bpf on host without JIT. ~1312 cycles/wrmsr. --> 16.4% >> >> So, debug entries are more efficient than the bpf method. > > How much does host performance matter? E.g. does high overhead interfere > with debug, is this something you want to have running at all times, etc… The intention is to have a long running statictics for monitoring etc. In general, using eBPF, ftrace etc. are okay for not-so-hot points, but this one turned out to be very difficult to do efficiently without such a code change. Fam