Re: [PATCH 1/5] KVM: x86: Remove VMX support for virtualizing guest MTRR memtypes

Dongli Zhang <dongli.zhang@xxxxxxxxxx> · Thu, 14 Mar 2024 03:31:10 -0700

On 3/12/24 10:08, Sean Christopherson wrote:
> On Mon, Mar 11, 2024, Dongli Zhang wrote:
>>
>>
>> On 3/8/24 17:09, Sean Christopherson wrote:
>>> Remove KVM's support for virtualizing guest MTRR memtypes, as full MTRR
>>> adds no value, negatively impacts guest performance, and is a maintenance
>>> burden due to it's complexity and oddities.
>>>
>>> KVM's approach to virtualizating MTRRs make no sense, at all.  KVM *only*
>>> honors guest MTRR memtypes if EPT is enabled *and* the guest has a device
>>> that may perform non-coherent DMA access.  From a hardware virtualization
>>> perspective of guest MTRRs, there is _nothing_ special about EPT.  Legacy
>>> shadowing paging doesn't magically account for guest MTRRs, nor does NPT.
>>
>> [snip]
>>
>>>  
>>> -bool __kvm_mmu_honors_guest_mtrrs(bool vm_has_noncoherent_dma)
>>> +bool kvm_mmu_may_ignore_guest_pat(void)
>>>  {
>>>  	/*
>>> -	 * If host MTRRs are ignored (shadow_memtype_mask is non-zero), and the
>>> -	 * VM has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is
>>> -	 * to honor the memtype from the guest's MTRRs so that guest accesses
>>> -	 * to memory that is DMA'd aren't cached against the guest's wishes.
>>> -	 *
>>> -	 * Note, KVM may still ultimately ignore guest MTRRs for certain PFNs,
>>> -	 * e.g. KVM will force UC memtype for host MMIO.
>>> +	 * When EPT is enabled (shadow_memtype_mask is non-zero), and the VM
>>> +	 * has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is to
>>> +	 * honor the memtype from the guest's PAT so that guest accesses to
>>> +	 * memory that is DMA'd aren't cached against the guest's wishes.  As a
>>> +	 * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA,
>>> +	 * KVM _always_ ignores guest PAT (when EPT is enabled).
>>>  	 */
>>> -	return vm_has_noncoherent_dma && shadow_memtype_mask;
>>> +	return shadow_memtype_mask;
>>>  }
>>>  
>>
>> Any special reason to use the naming 'may_ignore_guest_pat', but not
>> 'may_honor_guest_pat'?
> 
> Because which (after this series) is would either be misleading or outright wrong.
> If KVM returns true from the helper based solely on shadow_memtype_mask, then it's
> misleading because KVM will *always* honors guest PAT for such CPUs.  I.e. that
> name would yield this misleading statement.
> 
>   If the CPU supports self-snoop, KVM may honor guest PAT.
> 
> If KVM returns true iff self-snoop is NOT available (as proposed in this series),
> then it's outright wrong as KVM would return false, i.e. would make this incorrect
> statement:
> 
>   If the CPU supports self-snoop, KVM never honors guest PAT.
> 
> As saying that KVM may not or cannot do something is saying that KVM will never
> do that thing.
> 
> And because the EPT flag is "ignore guest PAT", not "honor guest PAT", but that's
> as much coincidence as it is anything else.
> 
>> Since it is also controlled by other cases, e.g., kvm_arch_has_noncoherent_dma()
>> at vmx_get_mt_mask(), it can be 'may_honor_guest_pat' too?
>>
>> Therefore, why not directly use 'shadow_memtype_mask' (without the API), or some
>> naming like "ept_enabled_for_hardware".
> 
> Again, after this series, KVM will *always* honor guest PAT for CPUs with self-snoop,
> i.e. KVM will *never* ignore guest PAT.  But for CPUs without self-snoop (or with
> errata), KVM conditionally honors/ignores guest PAT.
> 
>> Even with the code from PATCH 5/5, we still have high chance that VM has
>> non-coherent DMA?
> 
> I don't follow.  On CPUs with self-snoop, whether or not the VM has non-coherent
> DMA (from VFIO!) is irrelevant.  If the CPU has self-snoop, then KVM can safely
> honor guest PAT at all times.

Thank you very much for the explanation.

According to my understanding of the explanation (after this series):

1. When static_cpu_has(X86_FEATURE_SELFSNOOP) == true, it is 100% to "honor
guest PAT".

2. When static_cpu_has(X86_FEATURE_SELFSNOOP) == false (and
shadow_memtype_mask), although only 50% chance (depending on where there is
non-coherent DMA), at least now it is NOT 100% (to honor guest PAT) any longer.

Due to the fact it is not 100% (to honor guest PAT) any longer, there starts the
trend (from 100% to 50%) to "ignore guest PAT", that is:
kvm_mmu_may_ignore_guest_pat().

Dongli Zhang