Re: [PATCH v5 3/3] KVM: x86: add new nested vmexit tracepointsg

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 18 Dec 2024 13:14:01 -0800

On Tue, Sep 10, 2024, Maxim Levitsky wrote:
> Add 3 new tracepoints for nested VM exits which are intended
> to capture extra information to gain insights about the nested guest
> behavior.
> 
> The new tracepoints are:
> 
> - kvm_nested_msr
> - kvm_nested_hypercall

I 100% agree that not having register state in the exit tracepoints is obnoxious,
but I don't think we should add one-off tracepoints for the most annoying cases.
I would much prefer to figure out a way to capture register state in kvm_entry
and kvm_exit.  E.g. I've lost track of the number of times I've observed an MSR
exit without having trace_kvm_msr enabled.

One idea would be to capture E{A,B,C,D}X, which would cover MSRs, CPUID, and
most hypercalls.  And then we might even be able to drop the dedicated MSR and
CPUID tracepoints (not sure if that's a good idea).

Side topic, arch/s390/kvm/trace.h has the concept of COMMON information that is
captured for multiple tracepoints.  I haven't looked closely, but I gotta imagine
we can/should use a similar approach for x86.

> These tracepoints capture extra register state to be able to know
> which MSR or which hypercall was done.
> 
> - kvm_nested_page_fault
> 
> This tracepoint allows to capture extra info about which host pagefault
> error code caused the nested page fault.

The host error code, a.k.a. qualification info, is readily available in the
kvm_exit (or nested variant) tracepoint.  I don't letting userspace skip a
tracepoint that's probably already enabled is worth the extra code to support
this tracepoint.  The nested_svm_inject_npf_exit() code in particular is wonky,
and I think it's a good example of why userspace "needs" trace_kvm_exit, e.g. to
observe that a nested stage-2 page fault didn't originate from a hardware stage-2
fault.