Re: [RFC] Para-virtualized TLB flush for PV-waiting vCPUs

Sean Christopherson <seanjc@xxxxxxxxxx> · Tue, 21 Jan 2025 13:35:02 -0800

On Mon, Jan 20, 2025, Kenta Ishiguro wrote:
> Thank you for your comments.
> I understood the scenario of why my previous change was unsafe.
> 
> Also, I like the concept of KVM_VCPU_IN_HOST for tracking whether a vCPU is
> scheduled out because it can be helpful to understand the vCPU's situation
> from guests.
> 
> I have tested the attached changes, but I found that the performance
> improvement was somewhat limited. The boundary-checking code prevents
> KVM from setting KVM_VCPU_IN_HOST and KVM_VCPU_PREEMPTED, which may be
> contributing to this limitation.  I think this is a conservative
> approach to avoid using stale TLB entries.
> I referred to this patch:
> https://lore.kernel.org/lkml/20220614021116.1101331-1-sashal@xxxxxxxxxx/
> Since PV waiting causes the most significant overhead, is it possible to
> allow guests to perform PV flush if vCPUs are PV waiting and scheduled out?

Hmm, yes?  The "right now kvm_vcpu_check_block() is doing memory accesses" issue
was mostly addressed by commit 26844fee6ade ("KVM: x86: never write to memory from
kvm_vcpu_check_block()").

  It would in principle be okay to report the vCPU as preempted also
  if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the
  vmentry/vmexit overhead unnecessarily, and optimistic spinning is
  also unlikely to succeed.  However, leave it for later because right
  now kvm_vcpu_check_block() is doing memory accesses.  Even
  though the TLB flush issue only applies to virtual memory address,
  it's very much preferrable to be conservative.

I say mostly because there are technically reads to guest memory via cached
objects, specifically vmx_has_nested_events()'s use of the nested Posted Interrupt
Descriptor (PID).  But a vCPU's PID is referenced directly via physical address
in the VMCS, so there are no TLB flushing concerns on that front.

I think this would be safe/correct.  Key word "think".

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5193c3dfbce1..691cf4cfe5d4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2350,16 +2350,6 @@ static inline bool kvm_irq_is_postable(struct kvm_lapic_irq *irq)
                irq->delivery_mode == APIC_DM_LOWEST);
 }
 
-static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
-{
-       kvm_x86_call(vcpu_blocking)(vcpu);
-}
-
-static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
-{
-       kvm_x86_call(vcpu_unblocking)(vcpu);
-}
-
 static inline int kvm_cpu_get_apicid(int mps_cpu)
 {
 #ifdef CONFIG_X86_LOCAL_APIC
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5f3ad13a8ac7..4f8d0b000e93 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11170,6 +11170,18 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
        return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu);
 }
 
+void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
+{
+       vcpu->arch.at_instruction_boundary = true;
+       kvm_x86_call(vcpu_blocking)(vcpu);
+}
+
+void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+       kvm_x86_call(vcpu_unblocking)(vcpu);
+       vcpu->arch.at_instruction_boundary = false;
+}
+
 /* Called within kvm->srcu read side.  */
 static inline int vcpu_block(struct kvm_vcpu *vcpu)
 {