Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

Tomasz Figa <tfiga@xxxxxxxxxxxx> · Mon, 7 Oct 2019 23:14:13 +0900

Hi Dmitry,

On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov
<dmitry.morozov@xxxxxxxxxxxxxxx> wrote:
>
> Hello,
>
> We at OpenSynergy are also working on an abstract paravirtualized video
> streaming device that operates input and/or output data buffers and can be used
> as a generic video decoder/encoder/input/output device.
>
> We would be glad to share our thoughts and contribute to the discussion.
> Please see some comments regarding buffer allocation inline.
>
> Best regards,
> Dmitry.
>
> On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> > Hi Gerd,
> >
> > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann <kraxel@xxxxxxxxxx> wrote:
> > >   Hi,
> > >
> > > > Our prototype implementation uses [4], which allows the virtio-vdec
> > > > device to use buffers allocated by virtio-gpu device.
> > > >
> > > > [4] https://lkml.org/lkml/2019/9/12/157
> >
> > First of all, thanks for taking a look at this RFC and for valuable
> > feedback. Sorry for the late reply.
> >
> > For reference, Keiichi is working with me and David Stevens on
> > accelerated video support for virtual machines and integration with
> > other virtual devices, like virtio-gpu for rendering or our
> > currently-downstream virtio-wayland for display (I believe there is
> > ongoing work to solve this problem in upstream too).
> >
> > > Well.  I think before even discussing the protocol details we need a
> > > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > > should be an optional optimization and not a requirement.  Also the
> > > motivation for that should be clear (Let the host decoder write directly
> > > to virtio-gpu resources, to display video without copying around the
> > > decoded framebuffers from one device to another).
> >
> > Just to make sure we're on the same page, what would the buffers come
> > from if we don't use this optimization?
> >
> > I can imagine a setup like this;
> >  1) host device allocates host memory appropriate for usage with host
> > video decoder,
> >  2) guest driver allocates arbitrary guest pages for storage
> > accessible to the guest software,
> >  3) guest userspace writes input for the decoder to guest pages,
> >  4) guest driver passes the list of pages for the input and output
> > buffers to the host device
> >  5) host device copies data from input guest pages to host buffer
> >  6) host device runs the decoding
> >  7) host device copies decoded frame to output guest pages
> >  8) guest userspace can access decoded frame from those pages; back to 3
> >
> > Is that something you have in mind?
> While GPU side allocations can be useful (especially in case of decoder), it
> could be more practical to stick to driver side allocations. This is also due
> to the fact that paravirtualized encoders and cameras are not necessarily
> require a GPU device.
>
> Also, the v4l2 framework already features convenient helpers for CMA and SG
> allocations. The buffers can be used in the same manner as in virtio-gpu:
> buffers are first attached to an already allocated buffer/resource descriptor and
> then are made available for processing by the device using a dedicated command
> from the driver.

First of all, thanks a lot for your input. This is a relatively new
area of virtualization and we definitely need to collect various
possible perspectives in the discussion.

>From Chrome OS point of view, there are several aspects for which the
guest side allocation doesn't really work well:
1) host-side hardware has a lot of specific low level allocation
requirements, like alignments, paddings, address space limitations and
so on, which is not something that can be (easily) taught to the guest
OS,
2) allocation system is designed to be centralized, like Android
gralloc, because there is almost never a case when a buffer is to be
used only with 1 specific device. 99% of the cases are pipelines like
decoder -> GPU/display, camera -> encoder + GPU/display, GPU ->
encoder and so on, which means that allocations need to take into
account multiple hardware constraints.
3) protected content decoding: the memory for decoded video frames
must not be accessible to the guest at all

That said, the common desktop Linux model bases on allocating from the
producer device (which is why videobuf2 has allocation capability) and
we definitely need to consider this model, even if we just think about
Linux V4L2 compliance. That's why I'm suggesting the unified memory
handling based on guest physical addresses, which would handle both
guest-allocated and host-allocated memory.

Best regards,
Tomasz

> >
> > > Referencing virtio-gpu buffers needs a better plan than just re-using
> > > virtio-gpu resource handles.  The handles are device-specific.  What if
> > > there are multiple virtio-gpu devices present in the guest?
> > >
> > > I think we need a framework for cross-device buffer sharing.  One
> > > possible option would be to have some kind of buffer registry, where
> > > buffers can be registered for cross-device sharing and get a unique
> > > id (a uuid maybe?).  Drivers would typically register buffers on
> > > dma-buf export.
> >
> > This approach could possibly let us handle this transparently to
> > importers, which would work for guest kernel subsystems that rely on
> > the ability to handle buffers like native memory (e.g. having a
> > sgtable or DMA address) for them.
> >
> > How about allocating guest physical addresses for memory corresponding
> > to those buffers? On the virtio-gpu example, that could work like
> > this:
> >  - by default a virtio-gpu buffer has only a resource handle,
> >  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> > virtio-gpu device export the buffer to a host framework (inside the
> > VMM) that would allocate guest page addresses for it, which the
> > command would return in a response to the guest,
> >  - virtio-gpu driver could then create a regular DMA-buf object for
> > such memory, because it's just backed by pages (even though they may
> > not be accessible to the guest; just like in the case of TrustZone
> > memory protection on bare metal systems),
> >  - any consumer would be able to handle such buffer like a regular
> > guest memory, passing low-level scatter-gather tables to the host as
> > buffer descriptors - this would nicely integrate with the basic case
> > without buffer sharing, as described above.
> >
> > Another interesting side effect of the above approach would be the
> > ease of integration with virtio-iommu. If the virtio master device is
> > put behind a virtio-iommu, the guest page addresses become the input
> > to iommu page tables and IOVA addresses go to the host via the virtio
> > master device protocol, inside the low-level scatter-gather tables.
> >
> > What do you think?
> >
> > Best regards,
> > Tomasz
> >
> > > Another option would be to pass around both buffer handle and buffer
> > > owner, i.e. instead of "u32 handle" have something like this:
> > >
> > > struct buffer_reference {
> > >
> > >         enum device_type; /* pci, virtio-mmio, ... */
> > >         union device_address {
> > >
> > >                 struct pci_address pci_addr;
> > >                 u64 virtio_mmio_addr;
> > >                 [ ... ]
> > >
> > >         };
> > >         u64 device_buffer_handle; /* device-specific, virtio-gpu could use
> > >         resource ids here */>
> > > };
> > >
> > > cheers,
> > >
> > >   Gerd
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > For additional commands, e-mail: virtio-dev-help@xxxxxxxxxxxxxxxxxxxx
>
>