On Fri, Oct 11, 2019 at 5:54 PM Dmitry Morozov <dmitry.morozov@xxxxxxxxxxxxxxx> wrote: > > Hi Tomasz, > > On Mittwoch, 9. Oktober 2019 05:55:45 CEST Tomasz Figa wrote: > > On Tue, Oct 8, 2019 at 12:09 AM Dmitry Morozov > > > > <dmitry.morozov@xxxxxxxxxxxxxxx> wrote: > > > Hi Tomasz, > > > > > > On Montag, 7. Oktober 2019 16:14:13 CEST Tomasz Figa wrote: > > > > Hi Dmitry, > > > > > > > > On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov > > > > > > > > <dmitry.morozov@xxxxxxxxxxxxxxx> wrote: > > > > > Hello, > > > > > > > > > > We at OpenSynergy are also working on an abstract paravirtualized > > > > > video > > > > > streaming device that operates input and/or output data buffers and > > > > > can be > > > > > used as a generic video decoder/encoder/input/output device. > > > > > > > > > > We would be glad to share our thoughts and contribute to the > > > > > discussion. > > > > > Please see some comments regarding buffer allocation inline. > > > > > > > > > > Best regards, > > > > > Dmitry. > > > > > > > > > > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote: > > > > > > Hi Gerd, > > > > > > > > > > > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann <kraxel@xxxxxxxxxx> > wrote: > > > > > > > Hi, > > > > > > > > > > > > > > > Our prototype implementation uses [4], which allows the > > > > > > > > virtio-vdec > > > > > > > > device to use buffers allocated by virtio-gpu device. > > > > > > > > > > > > > > > > [4] https://lkml.org/lkml/2019/9/12/157 > > > > > > > > > > > > First of all, thanks for taking a look at this RFC and for valuable > > > > > > feedback. Sorry for the late reply. > > > > > > > > > > > > For reference, Keiichi is working with me and David Stevens on > > > > > > accelerated video support for virtual machines and integration with > > > > > > other virtual devices, like virtio-gpu for rendering or our > > > > > > currently-downstream virtio-wayland for display (I believe there is > > > > > > ongoing work to solve this problem in upstream too). > > > > > > > > > > > > > Well. I think before even discussing the protocol details we need > > > > > > > a > > > > > > > reasonable plan for buffer handling. I think using virtio-gpu > > > > > > > buffers > > > > > > > should be an optional optimization and not a requirement. Also > > > > > > > the > > > > > > > motivation for that should be clear (Let the host decoder write > > > > > > > directly > > > > > > > to virtio-gpu resources, to display video without copying around > > > > > > > the > > > > > > > decoded framebuffers from one device to another). > > > > > > > > > > > > Just to make sure we're on the same page, what would the buffers > > > > > > come > > > > > > from if we don't use this optimization? > > > > > > > > > > > > I can imagine a setup like this; > > > > > > > > > > > > 1) host device allocates host memory appropriate for usage with > > > > > > host > > > > > > > > > > > > video decoder, > > > > > > > > > > > > 2) guest driver allocates arbitrary guest pages for storage > > > > > > > > > > > > accessible to the guest software, > > > > > > > > > > > > 3) guest userspace writes input for the decoder to guest pages, > > > > > > 4) guest driver passes the list of pages for the input and output > > > > > > > > > > > > buffers to the host device > > > > > > > > > > > > 5) host device copies data from input guest pages to host buffer > > > > > > 6) host device runs the decoding > > > > > > 7) host device copies decoded frame to output guest pages > > > > > > 8) guest userspace can access decoded frame from those pages; back > > > > > > to 3 > > > > > > > > > > > > Is that something you have in mind? > > > > > > > > > > While GPU side allocations can be useful (especially in case of > > > > > decoder), > > > > > it could be more practical to stick to driver side allocations. This > > > > > is > > > > > also due to the fact that paravirtualized encoders and cameras are not > > > > > necessarily require a GPU device. > > > > > > > > > > Also, the v4l2 framework already features convenient helpers for CMA > > > > > and > > > > > SG > > > > > allocations. The buffers can be used in the same manner as in > > > > > virtio-gpu: > > > > > buffers are first attached to an already allocated buffer/resource > > > > > descriptor and then are made available for processing by the device > > > > > using > > > > > a dedicated command from the driver. > > > > > > > > First of all, thanks a lot for your input. This is a relatively new > > > > area of virtualization and we definitely need to collect various > > > > possible perspectives in the discussion. > > > > > > > > From Chrome OS point of view, there are several aspects for which the > > > > guest side allocation doesn't really work well: > > > > 1) host-side hardware has a lot of specific low level allocation > > > > requirements, like alignments, paddings, address space limitations and > > > > so on, which is not something that can be (easily) taught to the guest > > > > OS, > > > > > > I couldn't agree more. There are some changes by Greg to add support for > > > querying GPU buffer metadata. Probably those changes could be integrated > > > with 'a framework for cross-device buffer sharing' (something that Greg > > > mentioned earlier in the thread and that would totally make sense). > > > > Did you mean one of Gerd's proposals? > > > > I think we need some clarification there, as it's not clear to me > > whether the framework is host-side, guest-side or both. The approach I > > suggested would rely on a host-side framework and guest-side wouldn't > > need any special handling for sharing, because the memory would behave > > as on bare metal. > > > > However allocation would still need some special API to express high > > level buffer parameters and delegate the exact allocation requirements > > to the host. Currently virtio-gpu already has such interface and also > > has a DRM driver, which were the 2 main reasons for us to use it as > > the allocator in Chrome OS. (minigbm/cros_gralloc are implemented on > > top of the Linux DRM API) > > > Yes, it was about Gerd's proposals. To be honest, I was considering guest > allocations only. The operation flow in that case might look in more or less > the same way: the driver (GPU, Codec/Camera) first allocates a resource > descriptor on the host side. Than the driver uses the framework from above (so > support on both sides might be required) to request buffer metadata and does > allocations on the guest side accordingly. Then it attaches backing storage to > the host resource. > > > > 2) allocation system is designed to be centralized, like Android > > > > gralloc, because there is almost never a case when a buffer is to be > > > > used only with 1 specific device. 99% of the cases are pipelines like > > > > decoder -> GPU/display, camera -> encoder + GPU/display, GPU -> > > > > encoder and so on, which means that allocations need to take into > > > > account multiple hardware constraints. > > > > 3) protected content decoding: the memory for decoded video frames > > > > must not be accessible to the guest at all > > > > > > This looks like a valid use case. Would it also be possible for instance > > > to > > > allocate mem from a secure ION heap on the guest and then to provide the > > > sgt to the device? We don't necessarily need to map that sgt for guest > > > access. > > Could you elaborate on how that would work? Would the secure ION heap > > implementation use some protocol to request the allocation from the > > host? > > > > Another aspect is that on Chrome OS we don't support pre-reserved > > carveout heaps, so we need this memory to be allocated by the host > > dynamically. > > > My take on this (for a decoder) would be to allocate memory for output buffers > from a secure ION heap, import in the v4l2 driver, and then to provide those > to the device using virtio. The device side then uses the dmabuf framework to > make the buffers accessible for the hardware. I'm not sure about that, it's > just an idea. Where is the secure ION heap implemented? On the host or on the guest? If the latter, how is it ensured that it's really secure? That said, Chrome OS would use a similar model, except that we don't use ION. We would likely use minigbm backed by virtio-gpu to allocate appropriate secure buffers for us and then import them to the V4L2 driver. Best regards, Tomasz