On Mon, May 22, 2023 at 10:43:49AM -0700, Sean Christopherson wrote: > On Fri, May 19, 2023, Pawan Gupta wrote: > > On Thu, May 18, 2023 at 10:33:15AM -0700, Sean Christopherson wrote: > > > I made the mistake of digging into why KVM doesn't advertise ARCH_CAP_FB_CLEAR_CTRL... > > > > > > 1. I see *nothing* in commit 027bbb884be0 ("KVM: x86/speculation: Disable Fill > > > buffer clear within guests") that justifies 1x RDMSR and 2x WRMSR on every > > > entry+exit. > > > > Unnecessary VERWs in guest will have much higher impact than due to MSR > > read/write at vmentry/exit. > > Can you provide numbers for something closeish to a real world workload? I am collecting the numbers, will update here soon. > > On an Icelake system it is pointless for a guest to incur VERW penalty when > > the system is not affected by MDS/TAA and guests don't need mitigation for > > MMIO Stale Data. MSR writes are only done when the guest is likely to execute > > unnecessary VERWs(e.g. when the guest thinks its running on an older gen > > CPU). > > > > > KVM just needs to context switch the MSR between guests since the value that's > > > loaded while running in the host is irrelevant. E.g. use a percpu cache to > > > > I will be happy to avoid the MSR read/write, but its worth considering > > that this MSR can receive more bits that host may want to toggle, then > > percpu cache implementation would likely change. > > Change in and of itself isn't problematic, so long as whatever code we write won't > fall over if/when new bits are added, i.e. doesn't clobber unknown bits. Ok. > > > 5. MSR_IA32_MCU_OPT_CTRL is not modified by the host after a CPU is brought up, > > > i.e. the host's desired value is effectively static post-boot, and barring > > > a buggy configuration (running KVM as a guest), the boot CPU's value will be > > > the same as every other CPU. > > > > Would the MSR value be same on every CPU, if only some guests have > > enumerated FB_CLEAR and others haven't? > > Ignore the guest, I'm talking purely about the host. Specifically, there's no > reason to do a RDMSR to get the host value on every VM-Enter since the host's > value is effectively static post-boot. That right(ignoring late microcode load adding stuff to the MSR or msr-tools fiddling). > > MSR writes (to disable FB_CLEAR) are not done when a guest enumerates > > FB_CLEAR. Enumeration of FB_CLEAR in guest will depend on its configuration. > > > > > 6. Performance aside, KVM should not be speculating (ha!) on what the guest > > > will and will not do, and should instead honor whatever behavior is presented > > > to the guest. If the guest CPU model indicates that VERW flushes buffers, > > > then KVM damn well needs to let VERW flush buffers. > > > > The current implementation allows guests to have VERW flush buffers when > > they enumerate FB_CLEAR. It only restricts the flush behavior when the > > guest is trying to mitigate against a vulnerability(like MDS) on a > > hardware that is not affected. I guess its common for guests to be > > running with older gen configuration on a newer hardware. > > Right, I'm saying that that behavior is wrong. KVM shouldn't assume the guest > the guest will do things a certain way and should instead honor the "architectural" > definition, in quotes because I realize there probably is no architectural > definition for any of this. Before MMIO Stale Data, processors that were not affected by MDS/TAA did not clear CPU buffers, even if they enumerated MD_CLEAR. On such processors guests that deployed VERW(thinking they are vulnerable to MDS) did not clear the CPU buffers. After MMIO Stale Data was discovered FB_CLEAR_DIS was introduced to restore this behavior. > It might be that the code does (unintentionally?) honor the "architecture", i.e. > this code might actually be accurrate with respect to when the guest can expect > VERW to flush buffers. But the comment is so, so wrong. Agree, the comment needs to explain this well. > /* > * If guest will not execute VERW, there is no need to set FB_CLEAR_DIS > * at VMEntry. Skip the MSR read/write when a guest has no use case to > * execute VERW. > */ > if ((vcpu->arch.arch_capabilities & ARCH_CAP_FB_CLEAR) || > ((vcpu->arch.arch_capabilities & ARCH_CAP_MDS_NO) && > (vcpu->arch.arch_capabilities & ARCH_CAP_TAA_NO) && > (vcpu->arch.arch_capabilities & ARCH_CAP_PSDP_NO) && > (vcpu->arch.arch_capabilities & ARCH_CAP_FBSDP_NO) && > (vcpu->arch.arch_capabilities & ARCH_CAP_SBDR_SSDP_NO))) > vmx->disable_fb_clear = false;