Re: [PATCH] KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 22, 2023 at 10:43:49AM -0700, Sean Christopherson wrote:
> On Fri, May 19, 2023, Pawan Gupta wrote:
> > On Thu, May 18, 2023 at 10:33:15AM -0700, Sean Christopherson wrote:
> > > I made the mistake of digging into why KVM doesn't advertise ARCH_CAP_FB_CLEAR_CTRL...
> > > 
> > >   1. I see *nothing* in commit 027bbb884be0 ("KVM: x86/speculation: Disable Fill
> > >      buffer clear within guests") that justifies 1x RDMSR and 2x WRMSR on every
> > >      entry+exit.
> > 
> > Unnecessary VERWs in guest will have much higher impact than due to MSR
> > read/write at vmentry/exit.
> 
> Can you provide numbers for something closeish to a real world workload?

I am collecting the numbers, will update here soon.

> > On an Icelake system it is pointless for a guest to incur VERW penalty when
> > the system is not affected by MDS/TAA and guests don't need mitigation for
> > MMIO Stale Data. MSR writes are only done when the guest is likely to execute
> > unnecessary VERWs(e.g. when the guest thinks its running on an older gen
> > CPU).
> >
> > >      KVM just needs to context switch the MSR between guests since the value that's
> > >      loaded while running in the host is irrelevant.  E.g. use a percpu cache to
> > 
> > I will be happy to avoid the MSR read/write, but its worth considering
> > that this MSR can receive more bits that host may want to toggle, then
> > percpu cache implementation would likely change.
> 
> Change in and of itself isn't problematic, so long as whatever code we write won't
> fall over if/when new bits are added, i.e. doesn't clobber unknown bits.

Ok.

> > >   5. MSR_IA32_MCU_OPT_CTRL is not modified by the host after a CPU is brought up,
> > >      i.e. the host's desired value is effectively static post-boot, and barring
> > >      a buggy configuration (running KVM as a guest), the boot CPU's value will be
> > >      the same as every other CPU.
> > 
> > Would the MSR value be same on every CPU, if only some guests have
> > enumerated FB_CLEAR and others haven't?
> 
> Ignore the guest, I'm talking purely about the host.  Specifically, there's no
> reason to do a RDMSR to get the host value on every VM-Enter since the host's
> value is effectively static post-boot.

That right(ignoring late microcode load adding stuff to the MSR or
msr-tools fiddling).

> > MSR writes (to disable FB_CLEAR) are not done when a guest enumerates
> > FB_CLEAR. Enumeration of FB_CLEAR in guest will depend on its configuration.
> > 
> > >   6. Performance aside, KVM should not be speculating (ha!) on what the guest
> > >      will and will not do, and should instead honor whatever behavior is presented
> > >      to the guest.  If the guest CPU model indicates that VERW flushes buffers,
> > >      then KVM damn well needs to let VERW flush buffers.
> > 
> > The current implementation allows guests to have VERW flush buffers when
> > they enumerate FB_CLEAR. It only restricts the flush behavior when the
> > guest is trying to mitigate against a vulnerability(like MDS) on a
> > hardware that is not affected. I guess its common for guests to be
> > running with older gen configuration on a newer hardware.
> 
> Right, I'm saying that that behavior is wrong.  KVM shouldn't assume the guest
> the guest will do things a certain way and should instead honor the "architectural"
> definition, in quotes because I realize there probably is no architectural
> definition for any of this.

Before MMIO Stale Data, processors that were not affected by MDS/TAA did
not clear CPU buffers, even if they enumerated MD_CLEAR. On such
processors guests that deployed VERW(thinking they are vulnerable to
MDS) did not clear the CPU buffers. After MMIO Stale Data was discovered
FB_CLEAR_DIS was introduced to restore this behavior.

> It might be that the code does (unintentionally?) honor the "architecture", i.e.
> this code might actually be accurrate with respect to when the guest can expect
> VERW to flush buffers.  But the comment is so, so wrong.

Agree, the comment needs to explain this well.

> 	/*
> 	 * If guest will not execute VERW, there is no need to set FB_CLEAR_DIS
> 	 * at VMEntry. Skip the MSR read/write when a guest has no use case to
> 	 * execute VERW.
> 	 */
> 	if ((vcpu->arch.arch_capabilities & ARCH_CAP_FB_CLEAR) ||
> 	   ((vcpu->arch.arch_capabilities & ARCH_CAP_MDS_NO) &&
> 	    (vcpu->arch.arch_capabilities & ARCH_CAP_TAA_NO) &&
> 	    (vcpu->arch.arch_capabilities & ARCH_CAP_PSDP_NO) &&
> 	    (vcpu->arch.arch_capabilities & ARCH_CAP_FBSDP_NO) &&
> 	    (vcpu->arch.arch_capabilities & ARCH_CAP_SBDR_SSDP_NO)))
> 		vmx->disable_fb_clear = false;



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux