Hi Nicolas, I'm detaching this thread from our V4L2 stateless decoding spec since it has drifted off and would certainly be interesting to DRM folks as well! For context: I was initially talking about writing up support for the Allwinner 2D engine as a DRM render driver, where I'd like to be able to batch jobs that affect the same destination buffer to only signal the out fence once when the batch is done. We have a similar issue in v4l2 where we'd like the destination buffer for a set of requests (each covering one H264 slice) to be marked as done once the set was decoded. Le mercredi 17 avril 2019 à 12:22 -0400, Nicolas Dufresne a écrit : > > > > Interestingly, I'm experiencing the exact same problem dealing with a > > > > 2D graphics blitter that has limited ouput scaling abilities which > > > > imply handlnig a large scaling operation as multiple clipped smaller > > > > scaling operations. The issue is basically that multiple jobs have to > > > > be submitted to complete a single frame and relying on an indication > > > > from the destination buffer (such as a fence) doesn't work to indicate > > > > that all the operations were completed, since we get the indication at > > > > each step instead of at the end of the batch. > > > > > > That looks similar to the IMX.6 IPU m2m driver. It splits the image in > > > tiles of 1024x1024 and process each tile separately. This driver has > > > been around for a long time, so I guess they have a solution to that. > > > They don't need requests, because there is nothing to be bundled with > > > the input image. I know that Renesas folks have started working on a > > > de-interlacer. Again, this kind of driver may process and reuse input > > > buffers for motion compensation, but I don't think they need special > > > userspace API for that. > > > > Thanks for the reference! I hope it's not a blitter that was > > contributed as a V4L2 driver instead of DRM, as it probably would be > > more useful in DRM (but that's way beside the point). > > DRM does not offer a generic and discoverable interface for these > accelerators. Note that these drivers have most of the time started as > DRM driver and their DRM side where dropped. That was the case for > Exynos drivers at least. Heh, sadly I'm aware of how things turn out most of the time. The thing is that DRM expects drivers to implement their own interface. That's fine for passing BOs with GPU bitstream and textures, but not so much for dealing with framebuffer-based operations where the streaming and buffer interface that v4l2 has is a good fit. There's also the fact that the 2D pipeline is fixed-function and highly hardware-specific, so we need driver-specific job descriptions to really make the most of it. That's where v4l2 is not much of a good fit for complex 2D pipelines either. Most 2D engines can take multiple inputs and blit them together in various ways, which is too far from what v4l2 deals with. So we can have fixed single-buffer pipelines with at best CSC and scaling, but not much more with v4l2 really. I don't think it would be too much work to bring an interface to DRM in order to describe render framebuffers (we only have display framebuffers so far), with a simple queuing interface for scheduling driver-specific jobs, which could be grouped together to only signal the out fences when every buffer of the batch was done being rendered. This last point would allow handling cases where userapce need to perform multiple operations to carry out the single operation that it needs to do. In the case of my 2D blitter, that would be scaling above a 1024x1024 destination, which could be required to scaling a video buffer up to a 1920x1080 display. With that, we can e.g. page flip the 2D engine destination buffer and be certain that scaling will be fully done when the fence is signaled. There's also the userspace problem: DRM render has mesa to back it in userspace and provide a generic API for other programes. For 2D engines, we don't have much to hold on to. Cairo has a DRM render interface that supports a few DRM render drivers where there is either a 2D pipeline or where pre-built shaders are used to implement a 2D pipeline, and that's about it as far as I know. There's also the possibility of writing up a drm-render DDX to handle these 2D blitters that can make things a lot faster when running a desktop environment. As for wayland, well, I don't really know what to think. I was under the impression that it relies on GL for 2D operations, but am really not sure how true that actually is. > The thing is that DRM is great if you do immediate display stuff, while > V4L2 is nice if you do streaming, where you expect filling queued, and > popping buffers from queues. > > In the end, this is just an interface, nothing prevents you from making > an internal driver (like the Meson Canvas) and simply letting multiple > sub-system expose it. Specially that some of these IP will often > support both signal and memory processing, so they equally fit into a > media controller ISP, a v4l2 m2m or a DRM driver. Having base drivers that can hook to both v4l2 m2m and DRM would definitely be awesome. Maybe we could have some common internal synchronization logic to make writing these drivers easier. It would be cool if both could be used concurrently and not just return -EBUSY when the device is used with the other subsystem. Anyway, that's my 2 cents about the situation and what we can do to improve it. I'm definitely interested in tackling these items, but it may take some time before we get there. Not to mention we need to rework media/v4l2 for per-slice decoding support ;) > Another driver you might want to look is Rockchip RGA driver (which is > a multi function IP, including blitting). Yep, I've aware of it as well. There's also vivante which exposes 2D cores but I'm really not sure whether any function is actually implemented. OMAP4 and OMAP5 have a 2D engine that seems to be vivante as well from what I could find out, but it seems to only have blobs for bltsville and no significant docs. Cheers, Paul