> From: Chia-I Wu <olvaffe@xxxxxxxxx> > Sent: Friday, February 21, 2020 6:24 AM > > On Wed, Feb 19, 2020 at 6:38 PM Tian, Kevin <kevin.tian@xxxxxxxxx> wrote: > > > > > From: Tian, Kevin > > > Sent: Thursday, February 20, 2020 10:05 AM > > > > > > > From: Chia-I Wu <olvaffe@xxxxxxxxx> > > > > Sent: Thursday, February 20, 2020 3:37 AM > > > > > > > > On Wed, Feb 19, 2020 at 1:52 AM Tian, Kevin <kevin.tian@xxxxxxxxx> > wrote: > > > > > > > > > > > From: Paolo Bonzini > > > > > > Sent: Wednesday, February 19, 2020 12:29 AM > > > > > > > > > > > > On 14/02/20 23:03, Sean Christopherson wrote: > > > > > > >> On Fri, Feb 14, 2020 at 1:47 PM Chia-I Wu <olvaffe@xxxxxxxxx> > > > wrote: > > > > > > >>> AFAICT, it is currently allowed on ARM (verified) and AMD (not > > > > > > >>> verified, but svm_get_mt_mask returns 0 which supposedly > means > > > > the > > > > > > NPT > > > > > > >>> does not restrict what the guest PAT can do). This diff would do > the > > > > > > >>> trick for Intel without needing any uapi change: > > > > > > >> I would be concerned about Intel CPU errata such as SKX40 and > > > SKX59. > > > > > > > The part KVM cares about, #MC, is already addressed by forcing > UC > > > for > > > > > > MMIO. > > > > > > > The data corruption issue is on the guest kernel to correctly use > WC > > > > > > > and/or non-temporal writes. > > > > > > > > > > > > What about coherency across live migration? The userspace > process > > > > would > > > > > > use cached accesses, and also a WBINVD could potentially corrupt > guest > > > > > > memory. > > > > > > > > > > > > > > > > In such case the userspace process possibly should conservatively use > > > > > UC mapping, as if for MMIO regions on a passthrough device. > However > > > > > there remains a problem. the definition of KVM_MEM_DMA implies > > > > > favoring guest setting, which could be whatever type in concept. Then > > > > > assuming UC is also problematic. I'm not sure whether inventing > another > > > > > interface to query effective memory type from KVM is a good idea. > There > > > > > is no guarantee that the guest will use same type for every page in the > > > > > same slot, then such interface might be messy. Alternatively, maybe > > > > > we could just have an interface for KVM userspace to force memory > type > > > > > for a given slot, if it is mainly used in para-virtualized scenarios (e.g. > > > > > virtio-gpu) where the guest is enlightened to use a forced type (e.g. > WC)? > > > > KVM forcing the memory type for a given slot should work too. But the > > > > ignore-guest-pat bit seems to be Intel-specific. We will need to > > > > define how the second-level page attributes combine with the guest > > > > page attributes somehow. > > > > > > oh, I'm not aware of that difference. without an ipat-equivalent > > > capability, I'm not sure how to forcing random type here. If you look at > > > table 11-7 in Intel SDM, none of MTRR (EPT) memory type can lead to > > > consistent effective type when combining with random PAT value. So > > > it is definitely a dead end. > > > > > > > > > > > KVM should in theory be able to tell that the userspace region is > > > > mapped with a certain memory type and can force the same memory > type > > > > onto the guest. The userspace does not need to be involved. But that > > > > sounds very slow? This may be a dumb question, but would it help to > > > > add KVM_SET_DMA_BUF and let KVM negotiate the memory type with > the > > > > in-kernel GPU drivers? > > > > > > > > > > > > > > KVM_SET_DMA_BUF looks more reasonable. But I guess we don't need > > > KVM to be aware of such negotiation. We can continue your original > > > proposal to have KVM simply favor guest memory type (maybe still call > > > KVM_MEM_DMA). On the other hand, Qemu should just mmap on the > > > fd handle of the dmabuf passed from the virtio-gpu device backend, e.g. > > > to conduct migration. That way the mmap request is finally served by > > > DRM and underlying GPU drivers, with proper type enforced > automatically. > > > > > > > Thinking more possibly we don't need introduce new interface to KVM. > > As long as Qemu uses dmabuf interface to mmap the specific region, > > KVM can simply check memory type in host page table given hva of a > > memslot. If the type is UC or WC, it implies that userspace wants a > > non-coherent mapping which should be reflected in the guest side too. > > In such case, KVM can go to non-cohenrent DMA path and favor guest > > memory type automatically. > Sorry, I mixed two things together. > > Userspace access to dmabuf mmap must be guarded by > DMA_BUF_SYNC_{START,END} ioctls. It is possible that the GPU driver > always picks a WB mapping and let the ioctls flush/invalidate CPU > caches. We actually want the guest memory type to match vkMapMemory's > memory type, which can be different from dmabuf mmap's memory type. > It is not enough for KVM to inspect the hva's memory type. I'm not familiar with dmabuf and what is the difference between vkMapMemory and mmap. Just a simple thought that whatever memory type/synchronization enforced on the host userspace should ideally be applied to guest userspace too. e.g. in above example we possibly want the guest to use WB and issue flush/invalidate hypercalls to guard with other potential parallel operations in the host side. otherwise I cannot see how synchronization can be done when one use WB with sync primitives while the other simply use WC w/o such primitives. > > KVM_SET_DMA_BUF, if supported, is a signal to KVM that the guest > memory type should be honored (or forced if there is a new op in > dma_buf_ops that tells KVM which memory type to force). KVM_MEM_DMA > flag in this RFC sends the same signal. Unless KVM_SET_DMA_BUF gives > the userspace other features such as setting unlimited number of > dmabufs to subregions of a memslot, it is not very useful. the good part of a new interface is its simplicity, but only in slot granularity. instead having KVM to inspect hva can support page granularity, but adding run-time overhead. Let's see how Paolo thinks. 😊 > > If uapi change is to be avoided, it is the easiest that guest memory > type is always honored unless it causes #MC (i.e.,is_mmio==true). > I feel this goes too far... Thanks Kevin