On Fri, May 26, 2023 at 09:09:08AM -0700, Sean Christopherson wrote: > On Fri, May 26, 2023, Yan Zhao wrote: > > On Thu, May 25, 2023 at 07:53:41AM -0700, Sean Christopherson wrote: > > > > > > > So, if KVM zaps SPTEs when CR0.CD is cleared (even when the quirk is enabled), > > > > > > > then KVM can skip the MTRR zaps when CR0.CD=1 because KVM is ignoring the MTRRs > > > > > > > and will zap when CR0.CD is cleared. And to avoid regressing the CR0.CD case, > > > > > > > if KVM honors guest PAT for the bizarro CR0.CD quirk, then KVM only needs to > > > > > > > zap non-WB MTRR ranges when CR0.CD is cleared. Since WB is weak, looking for > > > > Not only non-WB ranges are required to be zapped. > > > > Think about a scenario: > > > > (1) one fixed MTRR range is UC initially. Its EPT entry memtype is UC, too. > > > > (2) after CR0.CD=1, without zap, its EPT entry memtype is still UC. > > > > (3) then guest performs MTRR disable, no zap too. > > > > (4) guest updates this fixed MTRR range to WB, and performs MTRR enable. > > > > (5) CR0.CD=0. we need to zap this fixed range to update the EPT memtype to WB. > > > > > > KVM installs WB memtype when the quirk is enabled, thus no need to zap. KVM > > > currently zaps everything when the quirk is disabled, and I'm not proposing that > > > be changed. > > I didn't explain it clearly. > > > > (1) one fixed MTRR range is UC initially. Its EPT entry memtype is UC, too. ==> EPT entry has been created here > > (2) after CR0.CD=1, because of the quirk, no zap, the created EPT entry memtype is still UC. > > (3) then guest performs MTRR disable, no zap too, according to our change. > > (4) guest updates this fixed MTRR range to WB, and performs MTRR enable. > > (5) CR0.CD=0. we need to zap this fixed range to update the EPT memtype to WB.==>we also need to zap WB range. > > Ugh, right. But that case can be handled by zapping non-WB ranges on CR0.CD being > set. Hmm, and KVM would need to zap everything if CR0.CD is toggled with MTRRs Ok. Actually even with below "zap everything always", I didn't observe any performance downgrade on OVMF in my side regarding to this change as we skipped EPT zap in update_mtrr() when CR0.CD=1. (even 3 less zaps for vCPU 0 and 1 less zap for APs as initially there are several MTRRs disabled when CR0.CD=1 without accompanying CR0.CD toggle). diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index dadf80fb85e9..ad3c4dc9d7bf 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -941,9 +941,9 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon if (((cr0 ^ old_cr0) & X86_CR0_CD) && - kvm_mmu_honors_guest_mtrr(vcpu->kvm) && - !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED) + kvm_mmu_honors_guest_mtrr(vcpu->kvm) kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL); } EXPORT_SYMBOL_GPL(kvm_post_set_cr0); > disabled, but I don't think OVMF ever does that. Zapping on CR0.CD toggling would > would likely introduce a small performance hit for SeaBIOS due to SeaBIOS clearing > CR0.CD immediately after SIPI, i.e. with MTRRs disabled, but that's arguably a For SeaBIOS, I also observed 1 less of EPT zap for each vCPU, because there are 3 more MTRR toggles than CR0.CD toggles for each vCPU, and only 2 MTRR toggles happen when CR0.CD=0. > correctness fix since the quirk means KVM incorrectly runs the vCPU with WB SPTEs > until MTRRs are programmed. > With precise+batched zapping, zapping non-WB ranges even when CR0.CD is set should > still be a healthy performance boost for OVMF. Ok. I'll try to implement in this way in the next version. > > (2) return MTRR_TYPE_WRBACK if guest mtrr has not been enabled for once > > u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) > > @@ -628,13 +635,23 @@ u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) > > struct mtrr_iter iter; > > u64 start, end; > > int type = -1; > > const int wt_wb_mask = (1 << MTRR_TYPE_WRBACK) > > | (1 << MTRR_TYPE_WRTHROUGH); > > > > + if (!mtrr_state->enabled_once) > > + return MTRR_TYPE_WRBACK; > > No, because this assumes too many things about the guest, and completely falls > apart if the guest reboots. Ah, right. > I am not willing to lean on the historical commit messages for the quirk to > justify any change. I'm not at all convinced that there was any meaningful thought > put into ensuring correctness. > > > we can return MTRR_TYPE_WRBACK for CR0.CD=1 only when MTRR is not enbled for > > once. (And I tried, it works!) > > On your system, for your setup. The quirk terrifies me because it likely affects > every KVM-based VM out there (I can't find a single VMM that disables the quirk). > These changes are limited to VMs with noncoherent DMA, but still. > > In short, I am steadfastly against any change that piles more arbitrary behavior > functional behavior on top of the quirk, especially when that behavior relies on > heuristics to try and guess what the guest is doing. Ok, yes, this is the right insistence.