From: Like Xu <likexu@xxxxxxxxxxx> When vcpu is consistent with kvm_get_running_vcpu(), use get_cpl directly to return the current exact state for the callers of vcpu_in_kernel API. In scenarios where VM payload is profiled via perf-kvm, it's noticed that the value of vcpu->arch.preempted_in_kernel is not strictly synchronised with current vcpu_cpl. This affects perf/core's ability to make use of the kvm_guest_state() API to tag guest RIP with PERF_RECORD_MISC_GUEST_{KERNEL|USER} and record it in the sample. This causes perf/tool to fail to connect the vcpu RIPs to the guest kernel space symbols when parsing these samples due to incorrect PERF_RECORD_MISC flags: Before (perf-report of a cpu-cycles sample): 1.23% :58945 [unknown] [u] 0xffffffff818012e0 Given the semantics of preempted_in_kernel, it may not be easy (w/o extra effort) to reconcile changes between preempted_in_kernel and CPL values. Therefore to make this API more trustworthy, fallback to using get_cpl() directly when the vcpu is loaded: After: 1.35% :60703 [kernel.vmlinux] [g] asm_exc_page_fault More performance squeezing is clearly possible, with priority given to correcting its accuracy as a basic move. Signed-off-by: Like Xu <likexu@xxxxxxxxxxx> --- arch/x86/kvm/x86.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2c924075f6f1..c454df904a45 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13031,7 +13031,10 @@ bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu) if (vcpu->arch.guest_state_protected) return true; - return vcpu->arch.preempted_in_kernel; + if (vcpu != kvm_get_running_vcpu()) + return vcpu->arch.preempted_in_kernel; + + return static_call(kvm_x86_get_cpl)(vcpu) == 0; } unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu) base-commit: 45b890f7689eb0aba454fc5831d2d79763781677 -- 2.43.0