Le mercredi 26 avril 2017 à 18:52 +0200, Tobias Jakobi a écrit : > Hello again, > > > Nicolas Dufresne wrote: > > Le mercredi 26 avril 2017 à 01:21 +0300, Sakari Ailus a écrit : > > > Hi Marek, > > > > > > On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote: > > > > Hi Laurent, > > > > > > > > On 2017-04-20 12:25, Laurent Pinchart wrote: > > > > > Hi Marek, > > > > > > > > > > (CC'ing Sakari Ailus) > > > > > > > > > > Thank you for the patches. > > > > > > > > > > On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote: > > > > > > Dear all, > > > > > > > > > > > > This is an updated proposal for extending EXYNOS DRM API with generic > > > > > > support for hardware modules, which can be used for processing image data > > > > > > from the one memory buffer to another. Typical memory-to-memory operations > > > > > > are: rotation, scaling, colour space conversion or mix of them. This is a > > > > > > follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer > > > > > > processors", which has been rejected as "not really needed in the DRM > > > > > > core": > > > > > > http://www.mail-archive.com/dri-devel@xxxxxxxxxxxxxxxxxxxxx/msg146286.html > > > > > > > > > > > > In this proposal I moved all the code to Exynos DRM driver, so now this > > > > > > will be specific only to Exynos DRM. I've also changed the name from > > > > > > framebuffer processor (fbproc) to picture processor (pp) to avoid confusion > > > > > > with fbdev API. > > > > > > > > > > > > Here is a bit more information what picture processors are: > > > > > > > > > > > > Embedded SoCs are known to have a number of hardware blocks, which perform > > > > > > such operations. They can be used in paralel to the main GPU module to > > > > > > offload CPU from processing grapics or video data. One of example use of > > > > > > such modules is implementing video overlay, which usually requires color > > > > > > space conversion from NV12 (or similar) to RGB32 color space and scaling to > > > > > > target window size. > > > > > > > > > > > > The proposed API is heavily inspired by atomic KMS approach - it is also > > > > > > based on DRM objects and their properties. A new DRM object is introduced: > > > > > > picture processor (called pp for convenience). Such objects have a set of > > > > > > standard DRM properties, which describes the operation to be performed by > > > > > > respective hardware module. In typical case those properties are a source > > > > > > fb id and rectangle (x, y, width, height) and destination fb id and > > > > > > rectangle. Optionally a rotation property can be also specified if > > > > > > supported by the given hardware. To perform an operation on image data, > > > > > > userspace provides a set of properties and their values for given fbproc > > > > > > object in a similar way as object and properties are provided for > > > > > > performing atomic page flip / mode setting. > > > > > > > > > > > > The proposed API consists of the 3 new ioctls: > > > > > > - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture > > > > > > processors, > > > > > > - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture > > > > > > processor, > > > > > > - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given > > > > > > property set. > > > > > > > > > > > > The proposed API is extensible. Drivers can attach their own, custom > > > > > > properties to add support for more advanced picture processing (for example > > > > > > blending). > > > > > > > > > > > > This proposal aims to replace Exynos DRM IPP (Image Post Processing) > > > > > > subsystem. IPP API is over-engineered in general, but not really extensible > > > > > > on the other side. It is also buggy, with significant design flaws - the > > > > > > biggest issue is the fact that the API covers memory-2-memory picture > > > > > > operations together with CRTC writeback and duplicating features, which > > > > > > belongs to video plane. Comparing with IPP subsystem, the PP framework is > > > > > > smaller (1807 vs 778 lines) and allows driver simplification (Exynos > > > > > > rotator driver smaller by over 200 lines). > > > > Just a side note, we have written code in GStreamer using the Exnynos 4 > > FIMC IPP driver. I don't know how many, if any, deployment still exist > > (Exynos 4 is relatively old now), but there exist userspace for the > > FIMC driver. > > I was searching for this code, but I didn't find anything. Are you sure > you really mean the FIMC IPP in Exynos DRM, and not just the FIMC driver > from the V4L2 subsystem? Oops, I manage to be unclear. Having two drivers on the same IP isn't helping. We wrote code around the FIMC driver on V4L2 side. This driver: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/platform/exynos4-is/fimc-m2m.c And this code: https://cgit.freedesktop.org/gstreamer/gst-plugins-good/tree/sys/v4l2/gstv4l2transform.c Unless I have miss-read, the proposal here is to deprecate the V4L side and improve the DRM side (which I stand against in my reply). > > > With best wishes, > Tobias > > > > > We use this for color transformation (from tiled to > > linear) and scaling. The FIMC driver is in fact quite stable in > > upstream kernel today. The GScaler V4L2 M2M driver on Exynos 5 is > > largely based on it and has received some maintenance to properly work > > in GStreamer. unlike this DRM API, you can reuse the same userspace > > code across multiple platforms (which we do already). We have also > > integrated this driver in Chromium in the past (not upstream though). > > > > I am well aware that the blitter driver has not got much attention > > though. But again, V4L2 offers a generic interface to userspace > > application. Fixing this driver could enable some work like this one: > > > > https://bugzilla.gnome.org/show_bug.cgi?id=772766 > > > > This work in progress feature is a generic hardware accelerated video > > mixer. It has been tested with IMX.6 v4l2 m2m blitter driver (which I > > believe is in staging right now). Again, unlike the exynos/drm, this > > code could be reused between platforms. > > > > In general, the problem with the DRM approach is that it only targets > > displays. We often need to use these IP block for stream pre/post > > processing outside a "playback" use case. > > > > What I'd like so see instead here, is an approach that helps both world > > instead of trying to win the control over the IP block. Renesas > > development seems to lead toward the right direction by creating > > drivers that can be both interfaced in DRM and V4L2. For IPP and > > GScaler on Exynos, this would be a greater benefit and finally the code > > could be shared, having a single place to fix when we find bugs. > > > > > > > > > > > > This seems to be the kind of hardware that is typically supported by V4L2. > > > > > Stupid question, why DRM ? > > > > > > > > Let me elaborate a bit on the reasons for implementing it in Exynos DRM: > > > > > > > > 1. we want to replace existing Exynos IPP subsystem: > > > > - it is used only in some internal/vendor trees, not in open-source > > > > - we want it to have sane and potentially extensible userspace API > > > > - but we don't want to loose its functionality > > > > > > > > 2. we want to have simple API for performing single image processing > > > > operation: > > > > - typically it will be used by compositing window manager, this means that > > > > some parameters of the processing might change on each vblank (like > > > > destination rectangle for example). This api allows such change on each > > > > operation without any additional cost. V4L2 requires to reinitialize > > > > queues with new configuration on such change, what means that a bunch of > > > > ioctls has to be called. > > > > > > What do you mean by re-initialising the queue? Format, buffers or something > > > else? > > > > > > If you need a larger buffer than what you have already allocated, you'll > > > need to re-allocate, V4L2 or not. > > > > > > We also do lack a way to destroy individual buffers in V4L2. It'd be up to > > > implementing that and some work in videobuf2. > > > > > > Another thing is that V4L2 is very stream oriented. For most devices that's > > > fine as a lot of the parameters are not changeable during streaming, > > > especially if the pipeline is handled by multiple drivers. That said, for > > > devices that process data from memory to memory performing changes in the > > > media bus formats and pipeline configuration is not very efficient > > > currently, largely for the same reason. > > > > > > The request API that people have been working for a bit different use cases > > > isn't in mainline yet. It would allow more efficient per-request > > > configuration than what is currently possible, but it has turned out to be > > > far from trivial to implement. > > > > > > > - validating processing parameters in V4l2 API is really complicated, > > > > because the parameters (format, src&dest rectangles, rotation) are being > > > > set incrementally, so we have to either allow some impossible, > > > > transitional > > > > configurations or complicate the configuration steps even more (like > > > > calling some ioctls multiple times for both input and output). In the end > > > > all parameters have to be again validated just before performing the > > > > operation. > > > > > > You have to validate the parameters in any case. In a MC pipeline this takes > > > place when the stream is started. > > > > > > > > > > > 3. generic approach (to add it to DRM core) has been rejected: > > > > http://www.mail-archive.com/dri-devel@xxxxxxxxxxxxxxxxxxxxx/msg146286.html > > > > > > For GPUs I generally understand the reasoning: there's a very limited number > > > of users of this API --- primarily because it's not an application > > > interface. > > > > > > If you have a device that however falls under the scope of V4L2 (at least > > > API-wise), does this continue to be the case? Will there be only one or two > > > (or so) users for this API? Is it the case here? > > > > > > Using a device specific interface definitely has some benefits: there's no > > > need to think how would you generalise the interface for other similar > > > devices. There's no need to consider backwards compatibility as it's not a > > > requirement. The drawback is that the applications that need to support > > > similar devices will bear the burden of having to support different APIs. > > > > > > I don't mean to say that you should ram whatever under V4L2 / MC > > > independently of how unworkable that might be, but there are also clear > > > advantages in using a standardised interface such as V4L2. > > > > > > V4L2 has a long history behind it and if it was designed today, I bet it > > > would look quite different from what it is now. > > > > > > > > > > > 4. this api can be considered as extended 'blit' operation, other DRM > > > > drivers > > > > (MGA, R128, VIA) already have ioctls for such operation, so there is also > > > > place in DRM for it > > > > Note that I am convince that using these custom IOCTL within a > > "compositor" implementation is much easier and uniform compared to > > using a v4l2 driver. It probably offers lower latency. But these are > > non-generic and are not a great fit for streaming purpose. Request API > > and probably explicit fence may mitigate this though. Meanwhile, there > > is some indication that even though complex, there is already some > > people that do think implementing a compositor combining V4L2 and DRM > > is feasible. > > > > http://events.linuxfoundation.org/sites/events/files/slides/als2015_way > > land_weston_v2.pdf > > > > > > > > Added LMML to cc. > > > > Thanks. > > > >
Attachment:
signature.asc
Description: This is a digitally signed message part