Re: issues with emulated PCI MMIO backed by host memory under KVM

Alexander Graf <agraf@xxxxxxx> · Mon, 27 Jun 2016 16:24:45 +0200

> Am 27.06.2016 um 12:34 schrieb Christoffer Dall <christoffer.dall@xxxxxxxxxx>:
> 
>> On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote:
>>> On 27 June 2016 at 11:16, Christoffer Dall <christoffer.dall@xxxxxxxxxx> wrote:
>>> Hi,
>>> 
>>> I'm going to ask some stupid questions here...
>>> 
>>>> On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote:
>>>> Hi all,
>>>> 
>>>> This old subject came up again in a discussion related to PCIe support
>>>> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO
>>>> regions as cacheable is preventing us from reusing a significant slice
>>>> of the PCIe support infrastructure, and so I'd like to bring this up
>>>> again, perhaps just to reiterate why we're simply out of luck.
>>>> 
>>>> To refresh your memories, the issue is that on ARM, PCI MMIO regions
>>>> for emulated devices may be backed by memory that is mapped cacheable
>>>> by the host. Note that this has nothing to do with the device being
>>>> DMA coherent or not: in this case, we are dealing with regions that
>>>> are not memory from the POV of the guest, and it is reasonable for the
>>>> guest to assume that accesses to such a region are not visible to the
>>>> device before they hit the actual PCI MMIO window and are translated
>>>> into cycles on the PCI bus.
>>> 
>>> For the sake of completeness, why is this reasonable?
>> 
>> Because the whole point of accessing these regions is to communicate
>> with the device. It is common to use write combining mappings for
>> things like framebuffers to group writes before they hit the PCI bus,
>> but any caching just makes it more difficult for the driver state and
>> device state to remain synchronized.
>> 
>>> Is this how any real ARM system implementing PCI would actually work?
>> 
>> Yes.
>> 
>>>> That means that mapping such a region
>>>> cacheable is a strange thing to do, in fact, and it is unlikely that
>>>> patches implementing this against the generic PCI stack in Tianocore
>>>> will be accepted by the maintainers.
>>>> 
>>>> Note that this issue not only affects framebuffers on PCI cards, it
>>>> also affects emulated USB host controllers (perhaps Alex can remind us
>>>> which one exactly?) and likely other emulated generic PCI devices as
>>>> well.
>>>> 
>>>> Since the issue exists only for emulated PCI devices whose MMIO
>>>> regions are backed by host memory, is there any way we can already
>>>> distinguish such memslots from ordinary ones? If we can, is there
>>>> anything we could do to treat these specially? Perhaps something like
>>>> using read-only memslots so we can at least trap guest writes instead
>>>> of having main memory going out of sync with the caches unnoticed? I
>>>> am just brainstorming here ...
>>> 
>>> I think the only sensible solution is to make sure that the guest and
>>> emulation mappings use the same memory type, either cached or
>>> non-cached, and we 'simply' have to find the best way to implement this.
>>> 
>>> As Drew suggested, forcing some S2 mappings to be non-cacheable is the
>>> one way.
>>> 
>>> The other way is to use something like what you once wrote that rewrites
>>> stage-1 mappings to be cacheable, does that apply here ?
>>> 
>>> Do we have a clear picture of why we'd prefer one way over the other?
>> 
>> So first of all, let me reiterate that I could only find a single
>> instance in QEMU where a PCI MMIO region is backed by host memory,
>> which is vga-pci.c. I wonder of there are any other occurrences, but
>> if there aren't any, it makes much more sense to prohibit PCI BARs
>> backed by host memory rather than spend a lot of effort working around
>> it.
> 
> Right, ok.  So Marc's point during his KVM Forum talk was basically,
> don't use the legacy VGA adapter on ARM and use virtio graphics, right?
> 
> What is the proposed solution for someone shipping an ARM server and
> wishing to provide a graphical output for that server?

Well, there is at least one server that I know of that has PCI VGA built in ;).

I think he was more concerned about VMs rather than real hardware.

> 
> It feels strange to work around supporting PCI VGA adapters in ARM VMs,
> if that's not a supported real hardware case.  However, I don't see what
> would prevent someone from plugging a VGA adapter into the PCI slot on
> an ARM server, and people selling ARM servers probably want this to
> happen, I'm guessing.
> 
>> 
>> If we do decide to fix this, the best way would be to use uncached
>> attributes for the QEMU userland mapping, and force it uncached in the
>> guest via a stage 2 override (as Drews suggests). The only problem I
>> see here is that the host's kernel direct mapping has a cached alias
>> that we need to get rid of.
> 
> Do we have a way to accomplish that?
> 
> Will we run into a bunch of other problems if we begin punching holes in
> the direct mapping for regular RAM?

Yeah, and how do you deal with aliases on that memory? You'd also need to stop ksm to run on it for example.

Alex

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm