Re: passing FDs across domains

Gerd Hoffmann <kraxel@xxxxxxxxxx> · Wed, 20 Mar 2019 13:11:32 +0100

  Hi,

> While Tomeu and others have been mostly considering the graphics use
> cases here (2D, 3D, Wayland), we've been looking into a wider range of
> use cases involving passing buffers between guest and host:
>  - video decoder/encoder (the general idea of the hardware may be
> inferred from corresponding V4L2 APIs modeling those:
> https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html),

Camera was mentioned too.

>  - generic IPC pass-through (e.g. Mojo, used extensively in Chromium),

What do you need here?

>  - crypto hardware accelerators.

Note: there is virtio-crypto.

> > > 1. Denial of Service.  A mechanism that involves pinning and relies on
> > > guest cooperation needs to be careful to prevent buggy or malicious
> > > guests from hogging resources that the host cannot reclaim.
> 
> I think we're talking about guest pages pinned by the guest kernel,
> right? The host (VMM) would be still the ultimate owner of all the
> memory given to the guest (VM), so how would it make it impossible for
> the host to reclaim that memory? Of course forcefully removing those
> pages from the guest could possibly crash it, but if it was
> malicious/buggy, I wouldn't expect it to work correctly anyway.

Guest dma-bufs are pinned in the guest kernel only.

Guest may ask the host to create host dma-bufs (which are pinned on the
host kernel), in which case the host can allow or deny that depending on
policy, limits, ...

> 2) The buffer was allocated entirely by the guest, from regular guest
> pages. (Currently we don't support this in Chromium OS.)
> 
> I assume the guest would send a list of pages (physical addresses,
> pfns?) to share.

Almost.  Not *that* simple, there can be a iommu so we have to use dma
addresses not ram addresses.  But, yes, this is what virtio-gpu is
fundamentally doing when creating resources.

> The host should be able to translate those into its
> own user memory addresses, since they would have been inserted to the
> guest by the host itself (using KVM_SET_USER_MEMORY_REGION). Then,
> whoever wants to access such memory on host side would have to import
> that to the host kernel, using some kind of userptr API, which would
> pin those pages on the host side.

WIP patches exist to use the new udmabuf driver for turning those guest
virtio-gpu resources into host dma-bufs.

> With that, even if the guest goes away, corresponding host memory
> would be still available, given that the above look-up and import
> procedure is mutually exclusive with guest termination code. Again, a
> simple mutex protecting the VMM user memory bookkeeping structures
> should work.

Yes.

> > > On the other hand, the host
> > > process shouldn't be able to hang the guest either by keeping the
> > > dma-buf alive.
> 
> I'm probably missing something. Could you elaborate on how that could
> happen? (Putting aside any side effects of the guest itself
> misbehaving, but then it's their fault that something bad happens to
> it.)

With dma-bufs backed by guest ram pages the guest can't reuse those
pages for something else as long as the dma-buf exists.

Which isn't much of a problem as long as qemu uses the dma-buf
internally only (qemu can get a linear mapping of a scattered buffer
that way).  Qemu can simply cleanup before it acks the resource destroy
command to the guest.  But when handing out dma-buf file handles to
other processes this needs some care.

> > > 2. dma-buf passing only works in the guest->host direction.  It
> > > doesn't work host<->host or guest<->guest (if we decide to support it
> > > in the future) because a guest cannot "see" memory ranges from the
> > > host or other guests.  I don't like this asymmetry but I guess we
> > > could live with it.
> 
> Why? The VMM given a host-side DMA-buf FD could mmap it and insert
> into the guest address space, using KVM_SET_USER_MEMORY_REGION or some
> other means (PCI memory bar?).

Yes, pci memory bar, for address space management reasons.

> > > I wonder if it would be cleaner to extend virtio-gpu for this use case
> > > instead of trying to pass buffers over AF_VSOCK.
> 
> We've been having some discussion about those other use cases and I
> believe there are several advantages of not tying this to virtio-gpu,
> e.g.
> 
> 1) vsock with SCM_RIGHTS would keep the POSIX socket API, for higher
> reusability of the userspace,

I have my doubts that it is possible to make this fully transparent.
For starters only certain kinds of file handles will work.

> 2) no need for different userspace proxies for different protocols
> (mojo, wayland) and different transports (vsock, virtio-gpu),

I'm not convinced it is that easy.  wayland clients for example expect
they can pass sysv shm handles.

One problem with sysv shm is that you can resize buffers.  Which in turn
is the reason why we have memfs with sealing these days.

> 3) no tying of other capabilities (e.g. video encoder, Mojo IPC
> pass-through, crypto) to the existence of virtio-gpu in the system,

OK.

> 4) simpler from the virtio interfaces point of view - no need for all
> the virtio-gpu complexity for a simple IPC pass-through,

OK.

> 5) could be a foundation for implementing sharing of other objects,
> e.g. guest shared memory by looking up guest pages in the host and
> constructing a shared memory object out of it (useful for shm-based
> wayland clients that allocate shm memory themselves).

For one: see the sysv shm note above.

Second: How do you want create a sysv shm object on the host side?

Third: Any plan for passing virtio-gpu resources to the host side when
running wayland over virtio-vsock?  With dumb buffers it's probably not
much of a problem, you can grab a list of pages and run with it.  But
for virgl-rendered resources (where the rendered data is stored in a
host texture) I can't see how that will work without copying around the
data.

cheers,
  Gerd