On Wed, May 19, 2021, Stefano De Venuto wrote: > The kvm_entry and kvm_exit tracepoints are still quite far from the > actual VMEnters/VMExits. This means that in a trace we can find host > events after a kvm_entry event and before a kvm_exit one, as in this > example: > > trace.dat: CPU 0/KVM-4594 [001] 2.167191: kvm_entry: > trace.dat: CPU 0/KVM-4594 [001] 2.167192: write_msr: 48, value 0 > trace.dat: CPU 0/KVM-4594 [001] 2.167192: rcu_utilization: Start context switch > trace.dat: CPU 0/KVM-4594 [001] 2.167192: rcu_utilization: End context switch > trace-tumbleweed.dat: <idle>-0 [000] 2.167196: hrtimer_cancel: > trace-tumbleweed.dat: <idle>-0 [000] 2.167197: hrtimer_expire_entry: > trace-tumbleweed.dat: <idle>-0 [000] 2.167201: hrtimer_expire_exit: > trace-tumbleweed.dat: <idle>-0 [000] 2.167201: hrtimer_start: > trace.dat: CPU 0/KVM-4594 [001] 2.167203: read_msr: 48, value 0 > trace.dat: CPU 0/KVM-4594 [001] 2.167203: write_msr: 48, value 4 > trace.dat: CPU 0/KVM-4594 [001] 2.167204: kvm_exit: > > This patch moves the tracepoints closer to the events, for both Intel > and AMD, so that a combined host-guest trace will offer a more > realistic representation of what is really happening, as shown here: > > trace.dat: CPU 0/KVM-2553 [000] 2.190290: write_msr: 48, value 0 I'm not sure this is a good thing, as it's not clear to me that invoking tracing with the guest's SPEC_CTRL loaded is desirable. Maybe it's a non-issue, but it should be explicitly called out and discussed. And to some extent, the current behavior is _more_ accurate because it shows that KVM started its VM-Enter sequence and then the WRMSR occured as part of that sequence. It is writing the guest's value after all. Ditto for XCR0, XSS, PKRU, Intel PT, etc... A more concrete example would be perf; on VMX, if a perf NMI happens after KVM invokes atomic_switch_perf_msrs() then I absolutely want to see that reflected in the trace, e.g. to help debug the PEBS mess[*]. If the VM-Enter tracepoint is moved closer to VM-Enter, that may or may not hold true depending on where the NMI lands. On VMX, I think the tracepoint can be moved below the VMWRITEs without much contention (though doing so is likely a nop), but moving it below kvm_load_guest_xsave_state() requires a bit more discussion. I 100% agree that the current behavior can be a bit confusing, but I wonder if we'd be better off "solving" that problem through documentation. [*] https://lkml.kernel.org/r/20210209225653.1393771-1-jmattson@xxxxxxxxxx > trace.dat: CPU 0/KVM-2553 [000] 2.190290: rcu_utilization: Start context switch > trace.dat: CPU 0/KVM-2553 [000] 2.190290: rcu_utilization: End context switch > trace.dat: CPU 0/KVM-2553 [000] 2.190290: kvm_entry: > trace-tumbleweed.dat: <idle>-0 [000] 2.190290: write_msr: > trace-tumbleweed.dat: <idle>-0 [000] 2.190290: cpu_idle: > trace.dat: CPU 0/KVM-2553 [000] 2.190291: kvm_exit: > trace.dat: CPU 0/KVM-2553 [000] 2.190291: read_msr: 48, value 0 > trace.dat: CPU 0/KVM-2553 [000] 2.190291: write_msr: 48, value 4 > > Signed-off-by: Stefano De Venuto <stefano.devenuto99@xxxxxxxxx> > Signed-off-by: Dario Faggioli <dfaggioli@xxxxxxxx> > --- ... > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 4bceb5ca3a89..33c732101b83 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -6661,6 +6661,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, > { > kvm_guest_enter_irqoff(); > > + trace_kvm_entry(vcpu); > + > /* L1D Flush includes CPU buffer clear to mitigate MDS */ > if (static_branch_unlikely(&vmx_l1d_should_flush)) > vmx_l1d_flush(vcpu); > @@ -6675,6 +6677,9 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, > > vcpu->arch.cr2 = native_read_cr2(); > > + vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON); > + trace_kvm_exit(vmx->exit_reason.full, vcpu, KVM_ISA_VMX); This is wrong in the 'vmx->fail == true' case. > + > kvm_guest_exit_irqoff(); > } > > @@ -6693,8 +6698,6 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu) > if (vmx->emulation_required) > return EXIT_FASTPATH_NONE; > > - trace_kvm_entry(vcpu); > - > if (vmx->ple_window_dirty) { > vmx->ple_window_dirty = false; > vmcs_write32(PLE_WINDOW, vmx->ple_window); > @@ -6814,15 +6817,12 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu) > return EXIT_FASTPATH_NONE; > } > > - vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON); > if (unlikely((u16)vmx->exit_reason.basic == EXIT_REASON_MCE_DURING_VMENTRY)) > kvm_machine_check(); > > if (likely(!vmx->exit_reason.failed_vmentry)) > vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD); > > - trace_kvm_exit(vmx->exit_reason.full, vcpu, KVM_ISA_VMX); > - > if (unlikely(vmx->exit_reason.failed_vmentry)) > return EXIT_FASTPATH_NONE; > > -- > 2.31.1 >