On Tue, Sep 05, 2023 at 02:33:26PM +0300, Pekka Paalanen wrote: > On Mon, 4 Sep 2023 14:29:56 +0000 > "Shankar, Uma" <uma.shankar@xxxxxxxxx> wrote: > > > > -----Original Message----- > > > From: Sebastian Wick <sebastian.wick@xxxxxxxxxx> > > > Sent: Thursday, August 31, 2023 2:46 AM > > > To: Shankar, Uma <uma.shankar@xxxxxxxxx> > > > Cc: Harry Wentland <harry.wentland@xxxxxxx>; intel- > > > gfx@xxxxxxxxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx; wayland- > > > devel@xxxxxxxxxxxxxxxxxxxxx; Ville Syrjala <ville.syrjala@xxxxxxxxxxxxxxx>; Pekka > > > Paalanen <pekka.paalanen@xxxxxxxxxxxxx>; Simon Ser <contact@xxxxxxxxxxx>; > > > Melissa Wen <mwen@xxxxxxxxxx>; Jonas Ådahl <jadahl@xxxxxxxxxx>; Shashank > > > Sharma <shashank.sharma@xxxxxxx>; Alexander Goins <agoins@xxxxxxxxxx>; > > > Naseer Ahmed <quic_naseer@xxxxxxxxxxx>; Christopher Braga > > > <quic_cbraga@xxxxxxxxxxx> > > > Subject: Re: [RFC 00/33] Add Support for Plane Color Pipeline > > > > > > On Wed, Aug 30, 2023 at 08:47:37AM +0000, Shankar, Uma wrote: > > > > > > > > > > > > > -----Original Message----- > > > > > From: Harry Wentland <harry.wentland@xxxxxxx> > > > > > Sent: Wednesday, August 30, 2023 12:56 AM > > > > > To: Shankar, Uma <uma.shankar@xxxxxxxxx>; > > > > > intel-gfx@xxxxxxxxxxxxxxxxxxxxx; dri- devel@xxxxxxxxxxxxxxxxxxxxx > > > > > Cc: wayland-devel@xxxxxxxxxxxxxxxxxxxxx; Ville Syrjala > > > > > <ville.syrjala@xxxxxxxxxxxxxxx>; Pekka Paalanen > > > > > <pekka.paalanen@xxxxxxxxxxxxx>; Simon Ser <contact@xxxxxxxxxxx>; > > > > > Melissa Wen <mwen@xxxxxxxxxx>; Jonas Ådahl <jadahl@xxxxxxxxxx>; > > > > > Sebastian Wick <sebastian.wick@xxxxxxxxxx>; Shashank Sharma > > > > > <shashank.sharma@xxxxxxx>; Alexander Goins <agoins@xxxxxxxxxx>; > > > > > Naseer Ahmed <quic_naseer@xxxxxxxxxxx>; Christopher Braga > > > > > <quic_cbraga@xxxxxxxxxxx> > > > > > Subject: Re: [RFC 00/33] Add Support for Plane Color Pipeline > > > > > > > > > > +CC Naseer and Chris, FYI > > > > > > > > > > See https://patchwork.freedesktop.org/series/123024/ for whole series. > > > > > > > > > > On 2023-08-29 12:03, Uma Shankar wrote: > > > > > > Introduction > > > > > > ============ > > > > > > > > > > > > Modern hardwares have various color processing capabilities both > > > > > > at pre-blending and post-blending phases in the color pipeline. > > > > > > The current drm implementation exposes only the post-blending > > > > > > color hardware blocks. Support for pre-blending hardware is missing. > > > > > > There are multiple use cases where pre-blending color hardware > > > > > > will be > > > > > > useful: > > > > > > a) Linearization of input buffers encoded in various transfer > > > > > > functions. > > > > > > b) Color Space conversion > > > > > > c) Tone mapping > > > > > > d) Frame buffer format conversion > > > > > > e) Non-linearization of buffer(apply transfer function) > > > > > > f) 3D Luts > > > > > > > > > > > > and other miscellaneous color operations. > > > > > > > > > > > > Hence, there is a need to expose the color capabilities of the > > > > > > hardware to user-space. This will help userspace/middleware to use > > > > > > display hardware for color processing and blending instead of > > > > > > doing it through GPU shaders. > > > > > > > > > > > > > > > > Thanks, Uma, for sending this. I've been working on something > > > > > similar but you beat me to it. :) > > > > > > > > Thanks Harry for the useful feedback and overall collaboration on this so far. > > > > > > > > > > > > > > > > Work done so far and relevant references > > > > > > ======================================== > > > > > > > > > > > > Some implementation is done by Intel and AMD/Igalia to address the same. > > > > > > Broad consensus is there that we need a generic API at drm core to > > > > > > suffice the use case of various HW vendors. Below are the links > > > > > > capturing the discussion so far. > > > > > > > > > > > > Intel's Plane Color Implementation: > > > > > > https://patchwork.freedesktop.org/series/90825/ > > > > > > AMD's Plane Color Implementation: > > > > > > https://patchwork.freedesktop.org/series/116862/ > > > > > > > > > > > > > > > > > > Hackfest conclusions > > > > > > ==================== > > > > > > > > > > > > HDR/Color Hackfest was organised by Redhat to bring all the > > > > > > industry stakeholders together and converge on a common uapi > > > expectations. > > > > > > Participants from Intel, AMD, Nvidia, Collabora, Redhat, Igalia > > > > > > and other prominent user-space developers and maintainers. > > > > > > > > > > > > Discussions happened on the uapi expectations, opens, nature of > > > > > > hardware of multiple hardware vendors, challenges in generalizing > > > > > > the same and the path forward. Consensus was made that drm core > > > > > > should implement descriptive APIs and not go with prescriptive > > > > > > APIs. DRM core should just expose the hardware capabilities; > > > > > > enabling, customizing and programming the same should be done by > > > > > > the user-space. Driver should just > > > > > honor the user space request without doing any operations internally. > > > > > > > > > > > > Thanks to Simon Ser, for nicely documenting the design consensus > > > > > > and an UAPI RFC which can be referred to here: > > > > > > > > > > > > https://lore.kernel.org/dri-devel/QMers3awXvNCQlyhWdTtsPwkp5ie9bze > > > > > > _hD5 > > > > > > > > > > > > > > nAccFW7a_RXlWjYB7MoUW_8CKLT2bSQwIXVi5H6VULYIxCdgvryZoAoJnC5lZgyK1 > > > Q > > > > > Wn48 > > > > > > 8=@emersion.fr/ > > > > > > > > > > > > > > > > > > Design considerations > > > > > > ===================== > > > > > > > > > > > > Following are the important aspects taken into account while > > > > > > designing the current RFC > > > > > > proposal: > > > > > > > > > > > > 1. Individual HW blocks can be muxed. (e.g. out of two HW blocks > > > > > > only one > > > > > can be used) > > > > > > 2. Position of the HW block in the pipeline can be programmable > > > > > > 3. LUTs can be one dimentional or three dimentional > > > > > > 4. Number of LUT entries can vary across platforms > > > > > > 5. Precision of LUT entries can vary across platforms > > > > > > 6. Distribution of LUT entries may vary. e.g Mutli-segmented, > > > Logarithmic, > > > > > > Piece-Wise Linear(PWL) etc > > > > > > 7. There can be parameterized/non-parameterized fixed function HW > > > > > blocks. > > > > > > e.g. Just a hardware bit, to convert from one color space to another. > > > > > > 8. Custom non-standard HW implementation. > > > > > > 9. Leaving scope for some vendor defined pescriptive > > > > > > implementation if > > > > > required. > > > > > > 10.Scope to handle any modification in hardware as technology > > > > > > evolves > > > > > > > > > > > > The current proposal takes into account the above considerations > > > > > > while keeping the implementation as generic as possible leaving > > > > > > scope for future > > > > > additions or modifications. > > > > > > > > > > > > This proposal is also in line to the details mentioned by Simon's > > > > > > RFC covering all the aspects discussed in hackfest. > > > > > > > > > > > > > > > > > > Outline of the implementation > > > > > > ============================ > > > > > > > > > > > > Each Color Hardware block will be represented by a data structure > > > drm_color_op. > > > > > > These color operations will form the building blocks of a color > > > > > > pipeline which best represents the underlying Hardware. Color > > > > > > operations can be re-arranged, substracted or added to create > > > > > > distinct color pipelines to accurately describe the Hardware > > > > > > blocks present in the display > > > > > engine. > > > > > > > > > > Who is doing the arranging of color operations? IMO a driver should > > > > > define one or more respective pipelines that can be selected by > > > > > userspace. This seems to be what you're talking about after (I > > > > > haven't reviewed the whole thing yet). Might be best to drop this sentence or > > > to add clarifications in order to avoid confusion. > > > > > > > > Yes it's the driver who will set the pipeline based on the underlying > > > > hardware arrangement and possible combinations. There can be multiple > > > > pipelines possible if hardware can be muxed or order can be re-arranged (all > > > viable combinations should be defined as a pipeline in driver). > > > > Yeah, I will re-phrase this to help clarify it and avoid any ambiguity. > > > > > > > > > > > > > > > > In this proposal, a color pipeline is represented as an array of > > > > > > struct drm_color_op. For individual color operation, we add blobs > > > > > > to advertise the capability of the particular Hardware block. > > > > > > > > > > > > This color pipeline is then packaged within a blob for the user > > > > > > space to retrieve it. > > > > > > > > > > > > > > > > Would it be better to expose the drm_color_ops directly, instead of > > > > > packing a array of drm_color_ops into a blob and then giving that to > > > userspace. > > > > > > > > Advantage we see in packing in blobs is that interface will be > > > > cleaner. There will be just one GET_COLOR_PIPELINE property to invoke by user > > > and then its just parsing the data. > > > > This way the entire underlying hardware blocks with pipeline will be available to > > > user. > > > > > > I don't see how parsing a blob is easier than requesting the color ops from the > > > kernel. User space is already equiped with getting KMS objects and the igt test > > > code from Harry shows that this is all pretty trivial plumbing. > > > > There are multiple color operations possible with unique lut distribution and > > capabilities. Also the order of hardware blocks and possibility of multiple pipelines. > > Having all the information with one query and property and also be able to set the > > same with just one property call using blobs is better than multiple calls to driver. > > This can be useful in high refresh rate cases where not much time is there to program > > the hardware state. Latency of multiple calls to driver can be avoided. > > Hi, > > querying all that information only happens once, at KMS client start-up. > > Setting up a color pipeline is already a single call: the atomic commit ioctl. Well, clients also issue a bunch of ioctl to set some properties to the desired state and then you might run through a whole bunch of configurations to find one that actually works, so there is a case to be made that there are a lot of ioctls involved. I just don't think this is an issue right now. Nobody has been complaining about the ioctls being a limiting factor so why should we optimize for this? Especially because it brings with it a bunch of disadvantages. Anyway, I agree with the sentiment here: this is not something we should optimize for. > > > > > > > For a particular hardware block in pipeline, user can get the relevant > > > > details from blob representing that particular block. We have created > > > > IGT tests (links mentioned in cover-letter) to demonstrate how it can be done. > > > This is just to clarify the idea. > > > > > > The blob is also not introspectable with the usual tools whereas exposing them as > > > properties would be. > > > > > > It also would, like Pekka correctly noted, bring a whole bunch of issues about > > > compatibility and versioning that are well understood with objects > > > + properties. > > > > The blob should be standardized in the UAPI and structures to parse them should be fixed. > > With this compatibility issues can be prevented. > > I think that is short-sighted. > > > > > Also since info is passed via blobs we have the flexibility to even > > > > define segmented LUTs and PWL hardware blocks. Also we have left scope > > > > for custom private hardware blocks as well which driver can work with > > > respective HAL and get that implemented. > > > > > > When color ops are real KMS objects they still can have properties which are > > > blobs that can store LUTs and other data. > > > > > > And we should avoid private blocks at all cost. In fact, I don't think the KMS rules > > > have changed in that regard and it simply is not allowed. > > > > Private blocks are not standardized but are vendor specific. So generic userspace will > > ignore these. However vendor and its respective HAL can make use of this field and leaves > > a scope to cater to such hardware vendors need. This doesn't affect the expectation of the > > standardized color operations which will be defined as enum in UAPI. > > Vendors can have and expose their own unique snowflake operations > without any "private" as well: pick an unused operation type code, and > document what it does. Advertise it in some pipelines. > > Vendor HALs can make use of it, but it also allows generic userspace to > make use of it at will, and it allows other vendors to implement the > same and benefit from it without needing to patch every userspace. > > Or does Intel not want other vendors to potentially make use of their > UAPIs? > > > > > We can even define prescriptive operations as a private entry and > > > > enable it if a certain driver and HAL agree. > > > > > > > > > > To advertise the available color pipelines, an immutable ENUM > > > > > > property "GET_COLOR_PIPELINE" is introduced. This enum property has > > > blob id's as values. > > > > > > With each blob id representing a distinct color pipeline based on > > > > > > underlying HW capabilities and their respective combinations. > > > > > > > > > > > > Once the user space decides on a color pipeline, it can set the > > > > > > pipeline and the corresponding data for the hardware blocks within > > > > > > the pipeline with the BLOB property "SET_COLOR_PIPELINE". > > > > > > > > > > > > > > > > When I discussed this at the hackfest with Simon he proposed a new > > > > > DRM object, (I've called it a drm_colorop) to represent a color operation. > > > > > Each drm_colorop has a "NEXT" pointer to another drm_colorop, or > > > > > NULL if its the last in the pipeline. You can then have a mutable > > > > > enum property on the plane to discover and select a color pipeline. > > > > > > > > Yes, the proposal is inspired by this idea. Sure, we can work together to enhance > > > the design. > > > > Personally I feel the one proposed in the current RFC will do the same > > > > thing envisioned by Simon and you Harry. Management of the pipeline, > > > > addition, deletion and flexibility to represent hardware is more with blobs with > > > the relevant structures agreed in UAPI. > > > > > > > > > This seems a bit more transparent than a blob. You can see my > > > > > changes (unfortunately very WIP, don't look too closely at > > > > > individual patches) at > > > > > https://gitlab.freedesktop.org/hwentland/linux/-/merge_requests/5/di > > > > > ffs > > > > > > > > > > libdrm changes: > > > > > https://gitlab.freedesktop.org/hwentland/drm/-/merge_requests/1/diff > > > > > s > > > > > > > > Sure, will check the same. > > > > > > > > > IGT changes: > > > > > https://gitlab.freedesktop.org/hwentland/igt-gpu-tools/-/merge_reque > > > > > sts/1/diffs > > > > > > > > > > I'll take time to review your whole series and will see whether we > > > > > can somehow keep the best parts of each. > > > > > > > > Thanks and agree. Let's work together and get this implemented in DRM. > > > > > > > > > Curious to hear other opinions on the blob vs new DRM object question as > > > well. > > > > > > I can see a few advantages with the blob approach. > > > > > > User space can store the entire state in one blob and just assign a new blob to > > > change to another pipeline configuration. > > > > Agree > > > > > However, I would argue that changing a lot of properties is already common > > > practice and works well. User space can deal with it and has abstractions to > > > make this managable. > > > > Blob gives a lot of flexibility and ability to define the hardware capabilities generically. > > The structure of the atomic commit ioctl and the KMS property system is > even more flexible. > > > Lut distribution, number of segments, samples in each segment, precision of luts etc. > > can be precisely defined and userspace will get a complete picture of the underlying > > hardware and its capabilities. Also this is being done with just 2 properties. Leaving > > scope for future addition of any standard color operation as well. > > The number of properties does not seem too useful to strictly minimise > over other aspects. > > > Thanks, > pq > > > > > I feel blob approach once properly documented is a bit more flexible and scalable. > > Maybe I am bit biased here 😊 but all ideas are welcome. > > > > We have implemented some IGT's as well to explain the design better. Link below: > > https://patchwork.freedesktop.org/series/123018/ > > > > Thanks Sebastian for the feedback. > > > > Regards, > > Uma Shankar