On 5/23/2023 2:21 PM, Yan Zhao wrote: [...]
Yes, leaving each vCPU's MTRR to mimic HW. As also suggested in SDM, the guest OS manipulates MTRRs in this way: for each online CPUs { disable MTRR update fixed/var MTRR ranges enable MTRR } Guest OS needs to access memory only after this full pattern.
Thanks for confirmation and clarification.
So, I think there should not have "hazard of inconsistency among per-VPU MTRR states". I want to have per-VM MTRR state is because I want to reduce unnessary EPT zap, which costs quite a lot cpu cycles even when the EPT is empty. In this patch, per-VM MTRR pointer is used to point to vCPU 0's MTRR state, so that it can save some memory to keep the MTRR state. But I found out that it would only work when vCPU 0 (boot processor) is always online (which is not true for x86 under some configration). I'll try to find out lowest online vCPU and keep a per-VM copy of MTRR state in next version. Thanks!
IIUC, your saving comes from skips the intermediate state during boot, when APs goes through setting MTRR, which would cause SPTE zap before your this patch set.
MHO was, now that your ignores other vCPU's MTRR settings (unless it is different from BP's MTRR?), why not let each vCPU's MTRR set/update directly set to the per-VM MTRR states (if differs from current value). It's guest OS/BIOS's responsibility to keep the consistency anyway. And even if the malfunction caused by the inconsistency might differ from that of native, SDM doesn't clearly state how the malfunction should be, does it? that's to say, anyone knows, when inconsistency happens, does it cause that logical processor malfunction or in fact it impacts the global MTRR settings? If the latter, I think leaving only the per-VM MTRR state aligns with native.
BTW, with regard to KVM_X86_QUIRK_CD_NW_CLEARED, I see svm honors it while vmx doesn't before it clear CR0.CD/NW.
svm_set_cr0(): if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED)) hcr0 &= ~(X86_CR0_CD | X86_CR0_NW); vmx_set_cr0(): hw_cr0 = (cr0 & ~KVM_VM_CR0_ALWAYS_OFF); Perhaps vmx side can be fixed passingly?