Re: tlb_flush stat on Intel/AMD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 20, 2020 at 12:08:05PM -0700, Jacob Xu wrote:
> Hi all,
> 
> I wrote a kvm selftest to enter a 1-vcpu guest VM running a nop in one
> thread and check the VM's debugfs entry for # of tlb flushes while
> that was running. Installing the latest upstream kernel and running
> this on an intel host showed a tlb_flush count of 30, while running it
> on an amd host shows the tlb_flush count at 0.
> 
> Do we have an inconsistency between Intel and AMD in how VCPU_STAT is
> incremented?

Yes, you can even drop "between Intel and AMD" and just state "we have
inconsistencies in how VCPU_STAT is incremented".

> From browsing the code, we see that the stat only gets incremented in
> the kvm_ wrappers of the x86_ops functions tlb_flush_all,
> tlb_flush_current, and tlb_flush_guest. These wrappers are only called
> via the KVM request api (referred to as deferred flushes in some other
> threads), and there other instances of calling the x86_ops tlb_flush
> methods directly (immediate flush).
> 
> It looks like most of the tlb flush calls are deferred, but there are
> a few instances using the immediate flush where it's not counted
> (kvm_mmu_load, svm_set_cr4, vmx_set_apic_access_page,
> nested_prepare_vmcb_control). Is there a guideline on when to
> deferred/immediate tlb_flush?

The rule of thumb is to defer the flush whenever possible so that KVM
doesn't flush multiple times in a single VM-Exit.   However, of the above
three, svm_set_cr4() is the only one that can be deferred, but only
because kvm_mmu_load() and vmx_set_apic_access_page() are reachable after
KVM_REQ_TLB_FLUSH_CURRENT is processed in vcpu_enter_guest().

The VMX APIC flush could be deferred by hoisting KVM_REQ_APIC_PAGE_RELOAD
up.  I think that's safe?  But it's a very infrequent operation so I'm not
exactly chomping at the bit to get it fixed.

> Could this be a cause for the lower tlb_flush stat seen on an AMD
> host? Or perhaps there's another reason for the difference due to the
> (too) simple selftest?

Yes, but it's likely not due to any of the paths listed above.   Obviously
accounting the flush in kvm_mmu_load() will elevate the count, but it will
affect VMX and SVM identically.

Given that you see 0 on SVM and a low number on VMX, my money is on the
difference being that VMX accounts the TLB flush that occurs on vCPU
migration.  vmx_vcpu_load_vmcs() makes a KVM_REQ_TLB_FLUSH request, whereas
svm_vcpu_load() resets asid_generation but doesn't increment the stats.
 
> In the case of svm_tlb_flush, it seems like the tlb flush is deferred
> anyway since the response to setting a tlb flush control bit in the
> VMCB is not acted upon until entering the guest. So it seems we could
> count tlb flushes on svm more easily by incrementing the counter by
> checking the control bit before KVM_RUN. Though perhaps there's
> another case we'd like to count as tlb flush when the guest switches
> ASID (where would we track this?).
>
> Would switching to this alternative for incrementing tlb_flush stat in
> svm be much different than what we do right now?

I think a better option is to keep the current accounting, defer flushes
when possible to naturally fix accounting, and then fix the remaining one
off cases, e.g. kvm_mmu_load() and svm_vcpu_load().

I can prep a small series unless you want the honors?



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux