Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Thu, 05 Mar 2015 13:26:39 +0100

On 05/03/2015 13:03, Catalin Marinas wrote:
>> > 
>> > I'd hate to have to do that. PCI should be entirely probeable
>> > given that we tell the guest where the host bridge is, that's
>> > one of its advantages.
> I didn't say a DT node per device, the DT doesn't know what PCI devices
> are available (otherwise it defeats the idea of probing). But we need to
> tell the OS where the host bridge is via DT.
> 
> So the guest would be told about two host bridges: one for real devices
> and another for virtual devices. These can have different coherency
> properties.

Yeah, and it would suck that the user needs to know the difference
between the coherency proprties of the host bridges.

It would especially suck if the user has a cluster with different
machines, some of them coherent and others non-coherent, and then has to
debug why the same configuration works on some machines and not on others.

To avoid replying in two different places, which of the solutions look
to me like something that half-works?  Pretty much all of them, because
in the end it is just a processor misfeature.  For example, Intel
virtualization extensions let the hypervisor override stage1 translation
_if necessary_.  AMD doesn't, but has some other quirky things that let
you achieve the same effect..

In particular, I am not even sure that this is about bus coherency,
because this problem does not happen when the device is doing bus master
DMA.  Working around coherency for bus master DMA would be easy.

The problem arises with MMIO areas that the guest can reasonably expect
to be uncacheable, but that are optimized by the host so that they end
up backed by cacheable RAM.  It's perfectly reasonable that the same
device needs cacheable mapping with one userspace, and works with
uncacheable mapping with another userspace that doesn't optimize the
MMIO area to RAM.

Currently the VGA framebuffer is the main case where this happen, and I
don't expect many more.  Because this is not bus master DMA, it's hard
to find a QEMU API that can be hooked to invalidate the cache.  QEMU is
just reading from an array of chars.

In practice, the VGA framebuffer has an optimization that uses dirty
page tracking, so we could piggyback on the ioctls that return which
pages are dirty.  It turns out that piggybacking on those ioctls also
should fix the case of migrating a guest while the MMU is disabled.

But it's a hack, and it may not work for other devices.

We could use _DSD to export the device tree property separately for each
device, but that wouldn't work for hotplugged devices.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html