On Fri, Mar 06, 2015 at 01:08:29PM -0800, Mario Smarduch wrote: > On 03/05/2015 09:43 AM, Paolo Bonzini wrote: > > > > > > On 05/03/2015 15:58, Catalin Marinas wrote: > >>> It would especially suck if the user has a cluster with different > >>> machines, some of them coherent and others non-coherent, and then has to > >>> debug why the same configuration works on some machines and not on others. > >> > >> That's a problem indeed, especially with guest migration. But I don't > >> think we have any sane solution here for the bus master DMA. > > > > I do not oppose doing cache management in QEMU for bus master DMA > > (though if the solution you outlined below works it would be great). > > > >> ARM can override them as well but only making them stricter. Otherwise, > >> on a weakly ordered architecture, it's not always safe (let's say the > >> guest thinks it accesses Strongly Ordered memory and avoids barriers for > >> flag updates but the host "upgrades" it to Cacheable which breaks the > >> memory order). > > > > The same can happen on x86 though, even if it's rarer. You still need a > > barrier between stores and loads. > > > >> If we want the host to enforce guest memory mapping attributes via stage > >> 2, we could do it the other way around: get the guests to always assume > >> full cache coherency, generating Normal Cacheable mappings, but use the > >> stage 2 attributes restriction in the host to make such mappings > >> non-cacheable when needed (it works this way on ARM but not in the other > >> direction to relax the attributes). > > > > That sounds like a plan for device assignment. But it still would not > > solve the problem of the MMIO framebuffer, right? > > > >>> The problem arises with MMIO areas that the guest can reasonably expect > >>> to be uncacheable, but that are optimized by the host so that they end > >>> up backed by cacheable RAM. It's perfectly reasonable that the same > >>> device needs cacheable mapping with one userspace, and works with > >>> uncacheable mapping with another userspace that doesn't optimize the > >>> MMIO area to RAM. > >> > >> Unless the guest allocates the framebuffer itself (e.g. > >> dma_alloc_coherent), we can't control the cacheability via > >> "dma-coherent" properties as it refers to bus master DMA. > > > > Okay, it's good to rule that out. One less thing to think about. :) > > Same for _DSD. > > > >> So for MMIO with the buffer allocated by the host (Qemu), the only > >> solution I see on ARM is for the host to ensure coherency, either via > >> explicit cache maintenance (new KVM API) or by changing the memory > >> attributes used by Qemu to access such virtual MMIO. > >> > >> Basically Qemu is acting as a bus master when reading the framebuffer it > >> allocated but the guest considers it a slave access and we don't have a > >> way to tell the guest that such accesses should be cacheable, nor can we > >> upgrade them via architecture features. > > > > Yes, that's a way to put it. > > > >>> In practice, the VGA framebuffer has an optimization that uses dirty > >>> page tracking, so we could piggyback on the ioctls that return which > >>> pages are dirty. It turns out that piggybacking on those ioctls also > >>> should fix the case of migrating a guest while the MMU is disabled. > >> > >> Yes, Qemu would need to invalidate the cache before reading a dirty > >> framebuffer page. > >> > >> As I said above, an API that allows non-cacheable mappings for the VGA > >> framebuffer in Qemu would also solve the problem. I'm not sure what KVM > >> provides here (or whether we can add such API). > > > > Nothing for now; other architectures simply do not have the issue. > > > > As long as it's just VGA, we can quirk it. There's just a couple > > vendor/device IDs to catch, and the guest can then use a cacheable mapping. > > > > For a more generic solution, the API would be madvise(MADV_DONTCACHE). > > It would be easy for QEMU to use it, but I am not too optimistic about > > convincing the mm folks about it. We can try. I forgot to list this one in my summary of approaches[*]. This is a nice, clean approach. Avoids getting cache maintenance into everything. However, besides the difficulty to get it past mm people, it reduces performance for any userspace-userspace uses/sharing of the memory. userspace-guest requires cache maintenance, but nothing else. Maybe that's not an important concern for the few emulated devices that need it though. > > Interested to see the outcome. > > I was thinking of a very basic memory driver that can provide > an uncached memslot to QEMU - in mmap() file operation > apply pgprot_uncached to allocated pages, lock them, flush TLB > call remap_pfn_range(). I guess this is the same as the madvise approach, but with a driver. KVM could take this approach itself when memslots are added/updated with the INCOHERENT flag. Maybe worth some experimental patches to find out? I'm still thinking about experimenting with the ARM private syscalls next though. drew [*] http://lists.gnu.org/archive/html/qemu-devel/2015-03/msg01254.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html