Hi everyone, On Mon, Mar 18, 2019 at 9:45 PM Tomeu Vizoso <tomeu.vizoso@xxxxxxxxxxxxx> wrote: > > [Tomasz wants to comment, adding him to CC] > Thanks Tomeu! > On 11/22/17 11:54 AM, Stefan Hajnoczi wrote: > > On Thu, Nov 16, 2017 at 3:51 PM, Gerd Hoffmann <kraxel@xxxxxxxxxx> wrote: > >>>>> I thought of SCM_RIGHTS on AF_VSOCK, which would return BADF if a FD is > >>>>> passed that cannot be shared in the requested direction. Are there any > >>>>> better options? > >>>> > >>>> When limiting this SCM_RIGHTS support to dma-bufs, guest -> host, that > >>>> could actually work. > >>> > >>> Would you go with SCM_RIGHTS as defined for AF_UNIX, or with a different, > >>> more specific name? > >> > >> Hmm, good question. Maybe a separate name is less confusing. > >> Thanks for comprehensive discussion and a lot of useful thoughts. Let me introduce myself first. I'm working for the Chromium OS team on video and camera support in Linux and related components. I've been involved in the Linux gpu/drm/ and media/ subsystems for few years, but I'm relatively new to the KVM/virtualization world. While Tomeu and others have been mostly considering the graphics use cases here (2D, 3D, Wayland), we've been looking into a wider range of use cases involving passing buffers between guest and host: - video decoder/encoder (the general idea of the hardware may be inferred from corresponding V4L2 APIs modeling those: https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html), - generic IPC pass-through (e.g. Mojo, used extensively in Chromium), - crypto hardware accelerators. > >>>> The nice thing about dma-bufs is that the size is > >>>> fixed and the pages are pinned. > >>> > >>> Those pages are permanently pinned? Would have expected to be only pinned > >>> while scanning out, and such. > >> > >> Hmm, not fully sure, maybe only when someone holds a reference. > >> Didn't look at them too deeply from the kernel side. > > > > From a AF_VSOCK perspective there are two things that worry me: > > > > 1. Denial of Service. A mechanism that involves pinning and relies on > > guest cooperation needs to be careful to prevent buggy or malicious > > guests from hogging resources that the host cannot reclaim. > > I think we're talking about guest pages pinned by the guest kernel, right? The host (VMM) would be still the ultimate owner of all the memory given to the guest (VM), so how would it make it impossible for the host to reclaim that memory? Of course forcefully removing those pages from the guest could possibly crash it, but if it was malicious/buggy, I wouldn't expect it to work correctly anyway. > > Imagine a process on the host is about to access the shared memory and > > the guest resets its virtio-vsock device or terminates. What happens > > now? Does the host process get a SIGBUS upon memory access? That > > would be bad for the host process. There are 2 cases here: 1) The buffer was allocated by the host (in response to a guest request) and exposed to the guest via some means (assigning a handle and/or installing in the guest address space via KVM_SET_USER_MEMORY_REGION or some other means (someone mentioned PCI memory bar?), if direct access is desired. In that case, the host would add the buffer into some internal bookkeeping structures, together with a single reference count (since the buffer was given to the guest). If the guest goes away, that reference is dropped, allocation released and buffer removed from the bookkeeping structures. Host-side accesses to those buffers are protected by the host kernel, by the means of fd duplication (userspace) or kernel-side DMA-buf reference counts (kernel DMA-buf importers). So handling the scenario described above boils down to properly synchronizing the guest reset/termination handling versus grabbing references to the buffer on the host side. A simple mutex on the VMM buffer bookkeeping structures should do it, although a less contentious scheme could be developed if needed too. 2) The buffer was allocated entirely by the guest, from regular guest pages. (Currently we don't support this in Chromium OS.) I assume the guest would send a list of pages (physical addresses, pfns?) to share. The host should be able to translate those into its own user memory addresses, since they would have been inserted to the guest by the host itself (using KVM_SET_USER_MEMORY_REGION). Then, whoever wants to access such memory on host side would have to import that to the host kernel, using some kind of userptr API, which would pin those pages on the host side. With that, even if the guest goes away, corresponding host memory would be still available, given that the above look-up and import procedure is mutually exclusive with guest termination code. Again, a simple mutex protecting the VMM user memory bookkeeping structures should work. > On the other hand, the host > > process shouldn't be able to hang the guest either by keeping the > > dma-buf alive. I'm probably missing something. Could you elaborate on how that could happen? (Putting aside any side effects of the guest itself misbehaving, but then it's their fault that something bad happens to it.) > > > > 2. dma-buf passing only works in the guest->host direction. It > > doesn't work host<->host or guest<->guest (if we decide to support it > > in the future) because a guest cannot "see" memory ranges from the > > host or other guests. I don't like this asymmetry but I guess we > > could live with it. Why? The VMM given a host-side DMA-buf FD could mmap it and insert into the guest address space, using KVM_SET_USER_MEMORY_REGION or some other means (PCI memory bar?). Alternatively, for hosts without capabilities of inserting memory (are there such?), an implementation with shadow buffers and transfers could be provided, as in current virtio-gpu. > > > > I wonder if it would be cleaner to extend virtio-gpu for this use case > > instead of trying to pass buffers over AF_VSOCK. We've been having some discussion about those other use cases and I believe there are several advantages of not tying this to virtio-gpu, e.g. 1) vsock with SCM_RIGHTS would keep the POSIX socket API, for higher reusability of the userspace, 2) no need for different userspace proxies for different protocols (mojo, wayland) and different transports (vsock, virtio-gpu), 3) no tying of other capabilities (e.g. video encoder, Mojo IPC pass-through, crypto) to the existence of virtio-gpu in the system, 4) simpler from the virtio interfaces point of view - no need for all the virtio-gpu complexity for a simple IPC pass-through, 5) could be a foundation for implementing sharing of other objects, e.g. guest shared memory by looking up guest pages in the host and constructing a shared memory object out of it (useful for shm-based wayland clients that allocate shm memory themselves). Any thoughts? Best regards, Tomasz