On Fri, Feb 14, 2020 at 1:47 PM Chia-I Wu <olvaffe@xxxxxxxxx> wrote: > > On Fri, Feb 14, 2020 at 11:52 AM Sean Christopherson > <sean.j.christopherson@xxxxxxxxx> wrote: > > > > On Fri, Feb 14, 2020 at 11:26:06AM +0100, Paolo Bonzini wrote: > > > On 13/02/20 23:18, Chia-I Wu wrote: > > > > > > > > The bug you mentioned was probably this one > > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=104091 > > > > > > Yes, indeed. > > > > > > > From what I can tell, the commit allowed the guests to create cached > > > > mappings to MMIO regions and caused MCEs. That is different than what > > > > I need, which is to allow guests to create uncached mappings to system > > > > ram (i.e., !kvm_is_mmio_pfn) when the host userspace also has uncached > > > > mappings. But it is true that this still allows the userspace & guest > > > > kernel to create conflicting memory types. > > > > This is ok. > > > > > Right, the question is whether the MCEs were tied to MMIO regions > > > specifically and if so why. > > > > 99.99999% likelihood the answer is "yes". Cacheable accesses to non-existent > > memory and most (all?) MMIO regions will cause a #MC. This includes > > speculative accesses. > > > > Commit fd717f11015f ("KVM: x86: apply guest MTRR virtualization on host > > reserved pages") explicitly had a comment "1. MMIO: trust guest MTRR", > > which is basically a direct avenue to generating #MCs. > > > > IIRC, WC accesses to non-existent memory will also cause #MC, but KVM has > > bigger problems if it has PRESENT EPTEs pointing at garbage. > > > > > An interesting remark is in the footnote of table 11-7 in the SDM. > > > There, for the MTRR (EPT for us) memory type UC you can read: > > > > > > The UC attribute comes from the MTRRs and the processors are not > > > required to snoop their caches since the data could never have > > > been cached. This attribute is preferred for performance reasons. > > > > > > There are two possibilities: > > > > > > 1) the footnote doesn't apply to UC mode coming from EPT page tables. > > > That would make your change safe. > > > > > > 2) the footnote also applies when the UC attribute comes from the EPT > > > page tables rather than the MTRRs. In that case, the host should use > > > UC as the EPT page attribute if and only if it's consistent with the host > > > MTRRs; it would be more or less impossible to honor UC in the guest MTRRs. > > > In that case, something like the patch below would be needed. > > > > (2), the EPTs effectively replace the MTRRs. The expectation being that > > the VMM will use always use EPT memtypes consistent with the MTRRs. > This is my understanding as well. > > > > It is not clear from the manual why the footnote would not apply to WC; that > > > is, the manual doesn't say explicitly that the processor does not do snooping > > > for accesses to WC memory. But I guess that must be the case, which is why I > > > used MTRR_TYPE_WRCOMB in the patch below. > > > > A few paragraphs below table 11-12 states: > > > > In particular, a WC page must never be aliased to a cacheable page because > > WC writes may not check the processor caches. > > > > > Either way, we would have an explanation of why creating cached mapping to > > > MMIO regions would, and why in practice we're not seeing MCEs for guest RAM > > > (the guest would have set WB for that memory in its MTRRs, not UC). > > > > Aliasing (physical) RAM with different memtypes won't cause #MC, just > > memory corruption. > > What we need potentially gives the userspace (the guest kernel, to be > exact) the ability to create conflicting memory types. If we can be > sure that the worst scenario is for a guest to corrupt its own memory, > by only allowing aliases on physical ram, that seems alright. > > AFAICT, it is currently allowed on ARM (verified) and AMD (not > verified, but svm_get_mt_mask returns 0 which supposedly means the NPT > does not restrict what the guest PAT can do). This diff would do the > trick for Intel without needing any uapi change: I would be concerned about Intel CPU errata such as SKX40 and SKX59.