On Wed, Apr 17, 2019 at 08:10:15PM +0200, Paul Kocialkowski wrote: > Hi Nicolas, > > I'm detaching this thread from our V4L2 stateless decoding spec since > it has drifted off and would certainly be interesting to DRM folks as > well! > > For context: I was initially talking about writing up support for the > Allwinner 2D engine as a DRM render driver, where I'd like to be able > to batch jobs that affect the same destination buffer to only signal > the out fence once when the batch is done. We have a similar issue in > v4l2 where we'd like the destination buffer for a set of requests (each > covering one H264 slice) to be marked as done once the set was decoded. > > Le mercredi 17 avril 2019 à 12:22 -0400, Nicolas Dufresne a écrit : > > > > > Interestingly, I'm experiencing the exact same problem dealing with a > > > > > 2D graphics blitter that has limited ouput scaling abilities which > > > > > imply handlnig a large scaling operation as multiple clipped smaller > > > > > scaling operations. The issue is basically that multiple jobs have to > > > > > be submitted to complete a single frame and relying on an indication > > > > > from the destination buffer (such as a fence) doesn't work to indicate > > > > > that all the operations were completed, since we get the indication at > > > > > each step instead of at the end of the batch. > > > > > > > > That looks similar to the IMX.6 IPU m2m driver. It splits the image in > > > > tiles of 1024x1024 and process each tile separately. This driver has > > > > been around for a long time, so I guess they have a solution to that. > > > > They don't need requests, because there is nothing to be bundled with > > > > the input image. I know that Renesas folks have started working on a > > > > de-interlacer. Again, this kind of driver may process and reuse input > > > > buffers for motion compensation, but I don't think they need special > > > > userspace API for that. > > > > > > Thanks for the reference! I hope it's not a blitter that was > > > contributed as a V4L2 driver instead of DRM, as it probably would be > > > more useful in DRM (but that's way beside the point). > > > > DRM does not offer a generic and discoverable interface for these > > accelerators. Note that these drivers have most of the time started as > > DRM driver and their DRM side where dropped. That was the case for > > Exynos drivers at least. > > Heh, sadly I'm aware of how things turn out most of the time. The thing > is that DRM expects drivers to implement their own interface. That's > fine for passing BOs with GPU bitstream and textures, but not so much > for dealing with framebuffer-based operations where the streaming and > buffer interface that v4l2 has is a good fit. > > There's also the fact that the 2D pipeline is fixed-function and highly > hardware-specific, so we need driver-specific job descriptions to > really make the most of it. That's where v4l2 is not much of a good fit > for complex 2D pipelines either. Most 2D engines can take multiple > inputs and blit them together in various ways, which is too far from > what v4l2 deals with. So we can have fixed single-buffer pipelines with > at best CSC and scaling, but not much more with v4l2 really. > > I don't think it would be too much work to bring an interface to DRM in > order to describe render framebuffers (we only have display > framebuffers so far), with a simple queuing interface for scheduling > driver-specific jobs, which could be grouped together to only signal > the out fences when every buffer of the batch was done being rendered. > This last point would allow handling cases where userapce need to > perform multiple operations to carry out the single operation that it > needs to do. In the case of my 2D blitter, that would be scaling above > a 1024x1024 destination, which could be required to scaling a video > buffer up to a 1920x1080 display. With that, we can e.g. page flip the > 2D engine destination buffer and be certain that scaling will be fully > done when the fence is signaled. > > There's also the userspace problem: DRM render has mesa to back it in > userspace and provide a generic API for other programes. For 2D > engines, we don't have much to hold on to. Cairo has a DRM render > interface that supports a few DRM render drivers where there is either > a 2D pipeline or where pre-built shaders are used to implement a 2D > pipeline, and that's about it as far as I know. > > There's also the possibility of writing up a drm-render DDX to handle > these 2D blitters that can make things a lot faster when running a > desktop environment. As for wayland, well, I don't really know what to > think. I was under the impression that it relies on GL for 2D > operations, but am really not sure how true that actually is. Just fyi in case you folks aren't aware, I typed up a blog a while ago about why drm doesn't have a 2d submit api: https://blog.ffwll.ch/2018/08/no-2d-in-drm.html > > The thing is that DRM is great if you do immediate display stuff, while > > V4L2 is nice if you do streaming, where you expect filling queued, and > > popping buffers from queues. > > > > In the end, this is just an interface, nothing prevents you from making > > an internal driver (like the Meson Canvas) and simply letting multiple > > sub-system expose it. Specially that some of these IP will often > > support both signal and memory processing, so they equally fit into a > > media controller ISP, a v4l2 m2m or a DRM driver. > > Having base drivers that can hook to both v4l2 m2m and DRM would > definitely be awesome. Maybe we could have some common internal > synchronization logic to make writing these drivers easier. We have, it's called dma_fence. Ties into dma_bufs using reservation_objecsts. > It would be cool if both could be used concurrently and not just return > -EBUSY when the device is used with the other subsystem. We live in this world already :-) I think there's even patches (or merged already) to add fences to v4l, for Android. > Anyway, that's my 2 cents about the situation and what we can do to > improve it. I'm definitely interested in tackling these items, but it > may take some time before we get there. Not to mention we need to > rework media/v4l2 for per-slice decoding support ;) > > > Another driver you might want to look is Rockchip RGA driver (which is > > a multi function IP, including blitting). > > Yep, I've aware of it as well. There's also vivante which exposes 2D > cores but I'm really not sure whether any function is actually > implemented. > > OMAP4 and OMAP5 have a 2D engine that seems to be vivante as well from > what I could find out, but it seems to only have blobs for bltsville > and no significant docs. Yeah that's the usual approach for drm 2d drivers: You have a bespoke driver in userspace. Usually that means an X driver, but there's been talk to pimp the hwc interface to make that _the_ 2d accel interface. There's also fbdev ... *shudder*. All of these options are geared towards ultimately displaying stuff on screens, not pure m2m 2d accel. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch