On Tue, Sep 10, 2024, Maxim Levitsky wrote: > Add 3 new tracepoints for nested VM exits which are intended > to capture extra information to gain insights about the nested guest > behavior. > > The new tracepoints are: > > - kvm_nested_msr > - kvm_nested_hypercall I 100% agree that not having register state in the exit tracepoints is obnoxious, but I don't think we should add one-off tracepoints for the most annoying cases. I would much prefer to figure out a way to capture register state in kvm_entry and kvm_exit. E.g. I've lost track of the number of times I've observed an MSR exit without having trace_kvm_msr enabled. One idea would be to capture E{A,B,C,D}X, which would cover MSRs, CPUID, and most hypercalls. And then we might even be able to drop the dedicated MSR and CPUID tracepoints (not sure if that's a good idea). Side topic, arch/s390/kvm/trace.h has the concept of COMMON information that is captured for multiple tracepoints. I haven't looked closely, but I gotta imagine we can/should use a similar approach for x86. > These tracepoints capture extra register state to be able to know > which MSR or which hypercall was done. > > - kvm_nested_page_fault > > This tracepoint allows to capture extra info about which host pagefault > error code caused the nested page fault. The host error code, a.k.a. qualification info, is readily available in the kvm_exit (or nested variant) tracepoint. I don't letting userspace skip a tracepoint that's probably already enabled is worth the extra code to support this tracepoint. The nested_svm_inject_npf_exit() code in particular is wonky, and I think it's a good example of why userspace "needs" trace_kvm_exit, e.g. to observe that a nested stage-2 page fault didn't originate from a hardware stage-2 fault.