> From: Chia-I Wu <olvaffe@xxxxxxxxx> > Sent: Thursday, February 20, 2020 3:18 AM > > On Wed, Feb 19, 2020 at 2:00 AM Tian, Kevin <kevin.tian@xxxxxxxxx> wrote: > > > > > From: Chia-I Wu > > > Sent: Saturday, February 15, 2020 5:15 AM > > > > > > On Fri, Feb 14, 2020 at 2:26 AM Paolo Bonzini <pbonzini@xxxxxxxxxx> > wrote: > > > > > > > > On 13/02/20 23:18, Chia-I Wu wrote: > > > > > > > > > > The bug you mentioned was probably this one > > > > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=104091 > > > > > > > > Yes, indeed. > > > > > > > > > From what I can tell, the commit allowed the guests to create cached > > > > > mappings to MMIO regions and caused MCEs. That is different than > what > > > > > I need, which is to allow guests to create uncached mappings to > system > > > > > ram (i.e., !kvm_is_mmio_pfn) when the host userspace also has > > > uncached > > > > > mappings. But it is true that this still allows the userspace & guest > > > > > kernel to create conflicting memory types. > > > > > > > > Right, the question is whether the MCEs were tied to MMIO regions > > > > specifically and if so why. > > > > > > > > An interesting remark is in the footnote of table 11-7 in the SDM. > > > > There, for the MTRR (EPT for us) memory type UC you can read: > > > > > > > > The UC attribute comes from the MTRRs and the processors are not > > > > required to snoop their caches since the data could never have > > > > been cached. This attribute is preferred for performance reasons. > > > > > > > > There are two possibilities: > > > > > > > > 1) the footnote doesn't apply to UC mode coming from EPT page tables. > > > > That would make your change safe. > > > > > > > > 2) the footnote also applies when the UC attribute comes from the EPT > > > > page tables rather than the MTRRs. In that case, the host should use > > > > UC as the EPT page attribute if and only if it's consistent with the host > > > > MTRRs; it would be more or less impossible to honor UC in the guest > > > MTRRs. > > > > In that case, something like the patch below would be needed. > > > > > > > > It is not clear from the manual why the footnote would not apply to WC; > > > that > > > > is, the manual doesn't say explicitly that the processor does not do > > > snooping > > > > for accesses to WC memory. But I guess that must be the case, which is > > > why I > > > > used MTRR_TYPE_WRCOMB in the patch below. > > > > > > > > Either way, we would have an explanation of why creating cached > mapping > > > to > > > > MMIO regions would, and why in practice we're not seeing MCEs for > guest > > > RAM > > > > (the guest would have set WB for that memory in its MTRRs, not UC). > > > > > > > > One thing you didn't say: how would userspace use KVM_MEM_DMA? > On > > > which > > > > regions would it be set? > > > It will be set for shmems that are mapped WC. > > > > > > GPU/DRM drivers allocate shmems as DMA-able gpu buffers and allow > the > > > userspace to map them cached or WC (I915_MMAP_WC or > > > AMDGPU_GEM_CREATE_CPU_GTT_USWC for example). When a shmem > is > > > mapped > > > WC and is made available to the guest, we would like the ability to > > > map the region WC in the guest. > > > > Curious... How is such slot exposed to the guest? A reserved memory > > region? Is it static or might be dynamically added? > The plan is for virtio-gpu device to reserve a huge memory region in > the guest. Memslots may be added dynamically or statically to back > the region. so the region is marked as E820_RESERVED to prevent guest kernel from using it for other purpose and then virtio-gpu device will report virtio-gpu driver of the exact location of the region through its own interface? > > Dynamic: the host adds a 16MB GPU allocation as a memslot at a time. > The guest kernel suballocates from the 16MB pool. > > Static: the host creates a huge PROT_NONE memfd and adds it as a > memslot. GPU allocations are mremap()ed into the memfd region to > provide the real mapping. > > These options are considered because the number of memslots are > limited: 32 on ARM and 509 on x86. If the number of memslots could be > made larger (4096 or more), we would also consider adding each > individual GPU allocation as a memslot. > > These are actually questions we need feedback. Besides, GPU > allocations can be assumed to be kernel dma-bufs in this context. I > wonder if it makes sense to have a variation of > KVM_SET_USER_MEMORY_REGION that takes dma-bufs. I feel it makes more sense. Thanks Kevin