On 05/03/2015 15:58, Catalin Marinas wrote: >> It would especially suck if the user has a cluster with different >> machines, some of them coherent and others non-coherent, and then has to >> debug why the same configuration works on some machines and not on others. > > That's a problem indeed, especially with guest migration. But I don't > think we have any sane solution here for the bus master DMA. I do not oppose doing cache management in QEMU for bus master DMA (though if the solution you outlined below works it would be great). > ARM can override them as well but only making them stricter. Otherwise, > on a weakly ordered architecture, it's not always safe (let's say the > guest thinks it accesses Strongly Ordered memory and avoids barriers for > flag updates but the host "upgrades" it to Cacheable which breaks the > memory order). The same can happen on x86 though, even if it's rarer. You still need a barrier between stores and loads. > If we want the host to enforce guest memory mapping attributes via stage > 2, we could do it the other way around: get the guests to always assume > full cache coherency, generating Normal Cacheable mappings, but use the > stage 2 attributes restriction in the host to make such mappings > non-cacheable when needed (it works this way on ARM but not in the other > direction to relax the attributes). That sounds like a plan for device assignment. But it still would not solve the problem of the MMIO framebuffer, right? >> The problem arises with MMIO areas that the guest can reasonably expect >> to be uncacheable, but that are optimized by the host so that they end >> up backed by cacheable RAM. It's perfectly reasonable that the same >> device needs cacheable mapping with one userspace, and works with >> uncacheable mapping with another userspace that doesn't optimize the >> MMIO area to RAM. > > Unless the guest allocates the framebuffer itself (e.g. > dma_alloc_coherent), we can't control the cacheability via > "dma-coherent" properties as it refers to bus master DMA. Okay, it's good to rule that out. One less thing to think about. :) Same for _DSD. > So for MMIO with the buffer allocated by the host (Qemu), the only > solution I see on ARM is for the host to ensure coherency, either via > explicit cache maintenance (new KVM API) or by changing the memory > attributes used by Qemu to access such virtual MMIO. > > Basically Qemu is acting as a bus master when reading the framebuffer it > allocated but the guest considers it a slave access and we don't have a > way to tell the guest that such accesses should be cacheable, nor can we > upgrade them via architecture features. Yes, that's a way to put it. >> In practice, the VGA framebuffer has an optimization that uses dirty >> page tracking, so we could piggyback on the ioctls that return which >> pages are dirty. It turns out that piggybacking on those ioctls also >> should fix the case of migrating a guest while the MMU is disabled. > > Yes, Qemu would need to invalidate the cache before reading a dirty > framebuffer page. > > As I said above, an API that allows non-cacheable mappings for the VGA > framebuffer in Qemu would also solve the problem. I'm not sure what KVM > provides here (or whether we can add such API). Nothing for now; other architectures simply do not have the issue. As long as it's just VGA, we can quirk it. There's just a couple vendor/device IDs to catch, and the guest can then use a cacheable mapping. For a more generic solution, the API would be madvise(MADV_DONTCACHE). It would be easy for QEMU to use it, but I am not too optimistic about convincing the mm folks about it. We can try. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html