Re: passing FDs across domains

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everyone,

On Mon, Mar 18, 2019 at 9:45 PM Tomeu Vizoso <tomeu.vizoso@xxxxxxxxxxxxx> wrote:
>
> [Tomasz wants to comment, adding him to CC]
>

Thanks Tomeu!

> On 11/22/17 11:54 AM, Stefan Hajnoczi wrote:
> > On Thu, Nov 16, 2017 at 3:51 PM, Gerd Hoffmann <kraxel@xxxxxxxxxx> wrote:
> >>>>> I thought of SCM_RIGHTS on AF_VSOCK, which would return BADF if a FD is
> >>>>> passed that cannot be shared in the requested direction. Are there any
> >>>>> better options?
> >>>>
> >>>> When limiting this SCM_RIGHTS support to dma-bufs, guest -> host, that
> >>>> could actually work.
> >>>
> >>> Would you go with SCM_RIGHTS as defined for AF_UNIX, or with a different,
> >>> more specific name?
> >>
> >> Hmm, good question.  Maybe a separate name is less confusing.
> >>

Thanks for comprehensive discussion and a lot of useful thoughts.

Let me introduce myself first. I'm working for the Chromium OS team on
video and camera support in Linux and related components. I've been
involved in the Linux gpu/drm/ and media/ subsystems for few years,
but I'm relatively new to the KVM/virtualization world.

While Tomeu and others have been mostly considering the graphics use
cases here (2D, 3D, Wayland), we've been looking into a wider range of
use cases involving passing buffers between guest and host:
 - video decoder/encoder (the general idea of the hardware may be
inferred from corresponding V4L2 APIs modeling those:
https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html),
 - generic IPC pass-through (e.g. Mojo, used extensively in Chromium),
 - crypto hardware accelerators.

> >>>>   The nice thing about dma-bufs is that the size is
> >>>> fixed and the pages are pinned.
> >>>
> >>> Those pages are permanently pinned? Would have expected to be only pinned
> >>> while scanning out, and such.
> >>
> >> Hmm, not fully sure, maybe only when someone holds a reference.
> >> Didn't look at them too deeply from the kernel side.
> >
> >  From a AF_VSOCK perspective there are two things that worry me:
> >
> > 1. Denial of Service.  A mechanism that involves pinning and relies on
> > guest cooperation needs to be careful to prevent buggy or malicious
> > guests from hogging resources that the host cannot reclaim.
> >

I think we're talking about guest pages pinned by the guest kernel,
right? The host (VMM) would be still the ultimate owner of all the
memory given to the guest (VM), so how would it make it impossible for
the host to reclaim that memory? Of course forcefully removing those
pages from the guest could possibly crash it, but if it was
malicious/buggy, I wouldn't expect it to work correctly anyway.

> > Imagine a process on the host is about to access the shared memory and
> > the guest resets its virtio-vsock device or terminates.  What happens
> > now?  Does the host process get a SIGBUS upon memory access?  That
> > would be bad for the host process.

There are 2 cases here:

1) The buffer was allocated by the host (in response to a guest
request) and exposed to the guest via some means (assigning a handle
and/or installing in the guest address space via
KVM_SET_USER_MEMORY_REGION or some other means (someone mentioned PCI
memory bar?), if direct access is desired.

In that case, the host would add the buffer into some internal
bookkeeping structures, together with a single reference count (since
the buffer was given to the guest). If the guest goes away, that
reference is dropped, allocation released and buffer removed from the
bookkeeping structures.

Host-side accesses to those buffers are protected by the host kernel,
by the means of fd duplication (userspace) or kernel-side DMA-buf
reference counts (kernel DMA-buf importers).

So handling the scenario described above boils down to properly
synchronizing the guest reset/termination handling versus grabbing
references to the buffer on the host side. A simple mutex on the VMM
buffer bookkeeping structures should do it, although a less
contentious scheme could be developed if needed too.

2) The buffer was allocated entirely by the guest, from regular guest
pages. (Currently we don't support this in Chromium OS.)

I assume the guest would send a list of pages (physical addresses,
pfns?) to share. The host should be able to translate those into its
own user memory addresses, since they would have been inserted to the
guest by the host itself (using KVM_SET_USER_MEMORY_REGION). Then,
whoever wants to access such memory on host side would have to import
that to the host kernel, using some kind of userptr API, which would
pin those pages on the host side.

With that, even if the guest goes away, corresponding host memory
would be still available, given that the above look-up and import
procedure is mutually exclusive with guest termination code. Again, a
simple mutex protecting the VMM user memory bookkeeping structures
should work.

>  On the other hand, the host
> > process shouldn't be able to hang the guest either by keeping the
> > dma-buf alive.

I'm probably missing something. Could you elaborate on how that could
happen? (Putting aside any side effects of the guest itself
misbehaving, but then it's their fault that something bad happens to
it.)

> >
> > 2. dma-buf passing only works in the guest->host direction.  It
> > doesn't work host<->host or guest<->guest (if we decide to support it
> > in the future) because a guest cannot "see" memory ranges from the
> > host or other guests.  I don't like this asymmetry but I guess we
> > could live with it.

Why? The VMM given a host-side DMA-buf FD could mmap it and insert
into the guest address space, using KVM_SET_USER_MEMORY_REGION or some
other means (PCI memory bar?).

Alternatively, for hosts without capabilities of inserting memory (are
there such?), an implementation with shadow buffers and transfers
could be provided, as in current virtio-gpu.

> >
> > I wonder if it would be cleaner to extend virtio-gpu for this use case
> > instead of trying to pass buffers over AF_VSOCK.

We've been having some discussion about those other use cases and I
believe there are several advantages of not tying this to virtio-gpu,
e.g.

1) vsock with SCM_RIGHTS would keep the POSIX socket API, for higher
reusability of the userspace,

2) no need for different userspace proxies for different protocols
(mojo, wayland) and different transports (vsock, virtio-gpu),

3) no tying of other capabilities (e.g. video encoder, Mojo IPC
pass-through, crypto) to the existence of virtio-gpu in the system,

4) simpler from the virtio interfaces point of view - no need for all
the virtio-gpu complexity for a simple IPC pass-through,

5) could be a foundation for implementing sharing of other objects,
e.g. guest shared memory by looking up guest pages in the host and
constructing a shared memory object out of it (useful for shm-based
wayland clients that allocate shm memory themselves).

Any thoughts?

Best regards,
Tomasz



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux