Le mercredi 20 mai 2020 à 12:27 -0400, Michael S. Tsirkin a écrit : > On Wed, May 20, 2020 at 12:21:05PM -0400, Nicolas Dufresne wrote: > > Le mercredi 20 mai 2020 à 12:19 +0900, Alexandre Courbot a écrit : > > > On Wed, May 20, 2020 at 2:29 AM Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote: > > > > Le mardi 19 mai 2020 à 17:37 +0900, Keiichi Watanabe a écrit : > > > > > Hi Nicolas, > > > > > > > > > > On Fri, May 15, 2020 at 8:38 AM Nicolas Dufresne < > > > > > nicolas@xxxxxxxxxxxx > > > > > > wrote: > > > > > > Le lundi 11 mai 2020 à 20:49 +0900, Keiichi Watanabe a écrit : > > > > > > > Hi, > > > > > > > > > > > > > > Thanks Saket for your feedback. As Dmitry mentioned, we're focusing on > > > > > > > video encoding and decoding, not camera. So, my reply was about how to > > > > > > > implement paravirtualized video codec devices. > > > > > > > > > > > > > > On Mon, May 11, 2020 at 8:25 PM Dmitry Sepp < > > > > > > > dmitry.sepp@xxxxxxxxxxxxxxx > > > > > > > wrote: > > > > > > > > Hi Saket, > > > > > > > > > > > > > > > > On Montag, 11. Mai 2020 13:05:53 CEST Saket Sinha wrote: > > > > > > > > > Hi Keiichi, > > > > > > > > > > > > > > > > > > I do not support the approach of QEMU implementation forwarding > > > > > > > > > requests to the host's vicodec module since this can limit the scope > > > > > > > > > of the virtio-video device only for testing, > > > > > > > > > > > > > > > > That was my understanding as well. > > > > > > > > > > > > > > Not really because the API which the vicodec provides is V4L2 stateful > > > > > > > decoder interface [1], which are also used by other video drivers on > > > > > > > Linux. > > > > > > > The difference between vicodec and actual device drivers is that > > > > > > > vicodec performs decoding in the kernel space without using special > > > > > > > video devices. In other words, vicodec is a software decoder in kernel > > > > > > > space which provides the same interface with actual video drivers. > > > > > > > Thus, if the QEMU implementation can forward virtio-video requests to > > > > > > > vicodec, it can forward them to the actual V4L2 video decoder devices > > > > > > > as well and VM gets access to a paravirtualized video device. > > > > > > > > > > > > > > The reason why we discussed vicodec in the previous thread was it'll > > > > > > > allow us to test the virtio-video driver without hardware requirement. > > > > > > > > > > > > > > [1] > > > > > > > https://www.kernel.org/doc/html/latest/media/uapi/v4l/dev-decoder.html > > > > > > > > > > > > > > > > > > > > > > > which instead can be used with multiple use cases such as - > > > > > > > > > > > > > > > > > > 1. VM gets access to paravirtualized camera devices which shares the > > > > > > > > > video frames input through actual HW camera attached to Host. > > > > > > > > > > > > > > > > This use-case is out of the scope of virtio-video. Initially I had a plan to > > > > > > > > support capture-only streams like camera as well, but later the decision was > > > > > > > > made upstream that camera should be implemented as separate device type. We > > > > > > > > still plan to implement a simple frame capture capability as a downstream > > > > > > > > patch though. > > > > > > > > > > > > > > > > > 2. If Host has multiple video devices (especially in ARM SOCs over > > > > > > > > > MIPI interfaces or USB), different VM can be started or hotplugged > > > > > > > > > with selective video streams from actual HW video devices. > > > > > > > > > > > > > > > > We do support this in our device implementation. But spec in general has no > > > > > > > > requirements or instructions regarding this. And it is in fact flexible > > > > > > > > enough > > > > > > > > to provide abstraction on top of several HW devices. > > > > > > > > > > > > > > > > > Also instead of using libraries like Gstreamer in Host userspace, they > > > > > > > > > can also be used inside the VM userspace after getting access to > > > > > > > > > paravirtualized HW camera devices . > > > > > > > > > > > > > > Regarding Gstreamer, I intended this video decoding API [2]. If QEMU > > > > > > > can translate virtio-video requests to this API, we can easily support > > > > > > > multiple platforms. > > > > > > > I'm not sure how feasible it is though, as I have no experience of > > > > > > > using this API by myself... > > > > > > > > > > > > Not sure which API you aim exactly, but what one need to remember is that > > > > > > mapping virtio-video CODEC on top of VAAPI, V4L2 Stateless, NVDEC or other type > > > > > > of "stateless" CODEC is not trivial and can't be done without userspace. Notably > > > > > > because we don't want to do bitstream parsing in the kernel on the main CPU as > > > > > > security would otherwise be very hard to guaranty. The other driver using same > > > > > > API as virtio-video do bitstream parsing on a dedicated co-processor (through > > > > > > firmware blobs though). > > > > > > > > > > > > Having bridges between virtio-video, qemu and some abstraction library like > > > > > > FFMPEG or GStreamer is certainly the best solution if you want to virtualize any > > > > > > type of HW accelerated decoder or if you need to virtualized something > > > > > > proprietary (like NVDEC). Please shout if you need help. > > > > > > > > > > > > > > > > Yeah, I meant we should map virtio-video commands to a set of > > > > > abstracted userspace APIs to avoid having many platform-dependent code > > > > > in QEMU. > > > > > This is the same with what we implemented in crosvm, a VMM on > > > > > ChromiumOS. Crosvm's video device translates virtio-video commands > > > > > into our own video decoding APIs [1, 2] which supports VAAPI, V4L2 > > > > > stateful and V4L2 stateless. Unfortunately, since our library is > > > > > highly depending on Chrome, we cannot reuse this for QEMU. > > > > > > > > > > So, I agree that using FFMPEG or GStreamer is a good idea. Probably, > > > > > APIs in my previous link weren't for this purpose. > > > > > Nicolas, do you know any good references for FFMPEG or GStreamer's > > > > > abstracted video decoding APIs? Then, I may be able to think about how > > > > > virtio-video protocols can be mapped to them. > > > > > > > > The FFMpeg API for libavcodec can be found here: > > > > > > > > http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/avcodec.h > > > > > > > > GStreamer does not really have such a low level CODEC API. So while > > > > it's possible to use it (Wine project uses it for it's parsers as an > > > > example, and Firefox use to have CODEC support wrapping GStreamer > > > > CODEC), there will not be any one-to-one mapping. GStreamer is often > > > > chosen as it's LGPL code does not carry directly any patented > > > > implementation. It instead rely on plugins, which maybe provided as > > > > third party, allowing to distribute your project while giving uses the > > > > option to install potentially non-free technologies. > > > > > > > > But overall, I can describe GStreamer API for CODEC wrapping (pipeline > > > > less) as: > > > > > > > > - Push GstCaps describing the stream format > > > > - Push bitstream buffer on sink pad > > > > - When ready, buffers will be pushed through the push function > > > > callback on src pad > > > > > > > > Of course nothing prevent adding something like the vda abstraction in > > > > qemu and make this multi-backend capable. > > > > > > My understanding is that we don't need a particularly low-level API to > > > interact with. The host virtual device is receiving the whole encoded > > > data, and can thus easily reconstruct the original stream (minus the > > > container) and pass it to ffmpeg/gstreamer. So we can be pretty > > > high-level here. > > > > > > Now the choice of API will also determine whether we want to allow > > > emulation of codec devices, or whether we stay on a purely > > > para-virtual track. If we use e.g. gstreamer, then the host can > > > provide a virtual device that is backed by a purely software > > > implementation. This can be useful for testing purposes, but for > > > real-life usage the guest would be just as well using gstreamer > > > itself. > > > > Agreed. > > > > > If we want to make sure that there is hardware on the host side, then > > > an API like libva might make more sense, but it would be more > > > complicated and may not support all hardware (I don't know if the V4L2 > > > backends are usable for instance). > > > > To bring VAAPI into Qemu directly you'd have to introduce bitstream > > parser, DPB management and other CODEC specific bits. I cannot speak > > for the project, but that's re-inventing the wheel again with very > > little gain. Best is to open the discussion with them early. > > > > Note that it's relatively simple in both framework to only choose HW > > accelerated CODECs. In ffmpeg, HW accelerator codecs can only be used > > with HWContext, so your wrapper need to know specific HWContext for the > > specific accelerator. In GStreamer, since 1.16, we add a metadata that > > let the user know which decoder is hardware accelerated. (This is > > usually used to disable HW acceleration at the moment). > > I don't know too much about the options here, unfortunately. But I > wonder about security implications of all these approaches. > > We have this issue with other cases such as libusb where the > library we are using is not expecting hostile input so does > not validate it fully. > This is often the case for pass-through approaches. > Do all the options here expect untrusted input? Both project cares as much as ChromeOS backend do. FFMPEG the main backend in Firefox notably, GStreamer is used in many embedded applications. We haven't started a complete rewrite in RUST (yet) though. Bitstream parsers (which are strictly requires for VAAPI and V4L2 Stateless CODEC handling through virtio-video) will always have possible security issues, they deal with user bitstream and a very large amount of parameters. A RUST rewrite only protects you from taking control through buffer overflows, it does not mean your code won't still have few crashers caused by hostile bitstream. The logical thing to do if it get integrated into QEmu will be to sandbox this bit. If you already virtualize your GPU, you likely have larger issues, as for many GPUs, malicious shaders could freeze few GPU cores for multiple seconds (or forever if you have older GPU drivers or a GPU that does not have preemption/reset support). Writing a backend from scratch just for QEmu will likely lead to no or little maintenance, as it's would be very niche in the project. Relying strictly on ChromeOS backend will mean a world without HEVC, without interlaced content, but is already better in my view then redoing that. Now it's unclear if Google will maintain a stable API there, something that GStreamer and FFMPEG seems to do well now. It was also mention in this discussion that it was not really an option, but I haven't yet captured why. There is plenty of approaches that could be taken of course. One could completely abstract that backend, and use PipeWire to stream the buffers between a sandboxed CODEC manager service and your QEmu instance (the codec handling could even run in a PipeWire real-time node to guaranty lowest latency). Or you could go with a custom, but more targeted design. I think that's all open to who will implement and what are the requirements. It also depends on the trend in the resource management that QEmu project tries to achieve (or if that's delegated somehow, I don't know). For CODECs, it can be quite variable how resources are available. Some V4L2 statefull driver offers only 1 or 2 instances which cannot be multiplexed. The highest resolution and rate might only be possible for 1 stream too. Most VAAPI / V4L2 sateless drivers can be multiplex without bound, but won't operate in real-time anymore if you have too many streams. So I think from a QEmu perspective point of view, the backend should enable few constraints, which in a real life deployement will endup having to be configured manually. All sort of things that need userspace for. Basically were I want to get with, is that the kernel will never fully offer this service.