Re: Inter-VM device emulation (call on Mon 20th July 2020)

Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx> · Mon, 27 Jul 2020 13:52:25 +0200

On Mon, Jul 27, 2020 at 11:14:03AM +0100, Stefan Hajnoczi wrote:
> On Tue, Jul 21, 2020 at 11:49:04AM +0100, Alex Bennée wrote:
> > Stefan Hajnoczi <stefanha@xxxxxxxxx> writes:
> > > 2. Alexander Graf's idea for a new Linux driver that provides an
> > > enforcing software IOMMU. This would be a character device driver that
> > > is mmapped by the device emulation process (either vhost-user-style on
> > > the host or another VMM for inter-VM device emulation). The Driver VMM
> > > can program mappings into the device and the page tables in the device
> > > emulation process will be updated. This way the Driver VMM can share
> > > memory specific regions of guest RAM with the device emulation process
> > > and revoke those mappings later.
> > 
> > I'm wondering if there is enough plumbing on the guest side so a guest
> > can use the virtio-iommu to mark out exactly which bits of memory the
> > virtual device can have access to? At a minimum the virtqueues need to
> > be accessible and for larger transfers maybe a bounce buffer. However

Just to make sure I didn't misunderstand - do you want to tell the guest
precisely where the buffers are, like "address X is the used ring, address
Y is the descriptor table", or do you want to specify a range of memory
where the guest can allocate DMA buffers, in no specific order, for a
given device?  So far I've assumed we're talking about the latter.

> > for speed you want as wide as possible mapping but no more. It would be
> > nice for example if a block device could load data directly into the
> > guests block cache (zero-copy) but without getting a view of the kernels
> > internal data structures.
> 
> Maybe Jean-Philippe or Eric can answer that?

Virtio-iommu could describe which bits of guest-physical memory is
available for DMA for a given device. It already provides a mechanism for
describing per-device memory properties (the PROBE request) which is
extensible. And I think the virtio-iommu device could be used exclusively
for this, too, by having DMA bypass the VA->PA translation
(VIRTIO_IOMMU_F_BYPASS) and only enforcing guest-physical boundaries. Or
just describe the memory and not enforce anything.

I don't know how to plug this into the DMA layer of a Linux guest, though,
but there seems to exist a per-device DMA pool infrastructure. Have you
looked at rproc_add_virtio_dev()?  It seems to allocates a specific DMA
region per device, from a "memory-region" device-tree property, so perhaps
you could simply reuse this.

Thanks,
Jean

> 
> > Another thing that came across in the call was quite a lot of
> > assumptions about QEMU and Linux w.r.t virtio. While our project will
> > likely have Linux as a guest OS we are looking specifically at enabling
> > virtio for Type-1 hypervisors like Xen and the various safety certified
> > proprietary ones. It is unlikely that QEMU would be used as the VMM for
> > these deployments. We want to work out what sort of common facilities
> > hypervisors need to support to enable virtio so the daemons can be
> > re-usable and maybe setup with a minimal shim for the particular
> > hypervisor in question.
> 
> The vhost-user protocol together with the backend program conventions
> define the wire protocol and command-line interface (see
> docs/interop/vhost-user.rst).
> 
> vhost-user is already used by other VMMs today. For example,
> cloud-hypervisor implements vhost-user.
> 
> I'm sure there is room for improvement, but it seems like an incremental
> step given that vhost-user already tries to cater for this scenario.
> 
> Are there any specific gaps you have identified?
> 
> Stefan