On Fri, May 5, 2023 at 5:28 PM Daniel Vetter <daniel@xxxxxxxx> wrote: > > On Thu, May 04, 2023 at 03:22:59PM +0000, Simon Ser wrote: > > Hi all, > > > > The goal of this RFC is to expose a generic KMS uAPI to configure the color > > pipeline before blending, ie. after a pixel is tapped from a plane's > > framebuffer and before it's blended with other planes. With this new uAPI we > > aim to reduce the battery life impact of color management and HDR on mobile > > devices, to improve performance and to decrease latency by skipping > > composition on the 3D engine. This proposal is the result of discussions at > > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers > > familiar with the AMD, Intel and NVIDIA hardware have participated in the > > discussion. > > > > This proposal takes a prescriptive approach instead of a descriptive approach. > > Drivers describe the available hardware blocks in terms of low-level > > mathematical operations, then user-space configures each block. We decided > > against a descriptive approach where user-space would provide a high-level > > description of the colorspace and other parameters: we want to give more > > control and flexibility to user-space, e.g. to be able to replicate exactly the > > color pipeline with shaders and switch between shaders and KMS pipelines > > seamlessly, and to avoid forcing user-space into a particular color management > > policy. > > Ack on the prescriptive approach, but generic imo. Descriptive pretty much > means you need the shaders at the same api level for fallback purposes, > and we're not going to have that ever in kms. That would need something > like hwc in userspace to work. Which would be nice to have but that would be forcing a specific color pipeline on everyone and we explicitly want to avoid that. There are just too many trade-offs to consider. > And not generic in it's ultimate consquence would mean we just do a blob > for a crtc with all the vendor register stuff like adf (android display > framework) does, because I really don't see a point in trying a > generic-looking-but-not vendor uapi with each color op/stage split out. > > So from very far and pure gut feeling, this seems like a good middle > ground in the uapi design space we have here. Good to hear! > > We've decided against mirroring the existing CRTC properties > > DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management > > pipeline can significantly differ between vendors and this approach cannot > > accurately abstract all hardware. In particular, the availability, ordering and > > capabilities of hardware blocks is different on each display engine. So, we've > > decided to go for a highly detailed hardware capability discovery. > > > > This new uAPI should not be in conflict with existing standard KMS properties, > > since there are none which control the pre-blending color pipeline at the > > moment. It does conflict with any vendor-specific properties like > > NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific > > properties. Drivers will need to either reject atomic commits configuring both > > uAPIs, or alternatively we could add a DRM client cap which hides the vendor > > properties and shows the new generic properties when enabled. > > > > To use this uAPI, first user-space needs to discover hardware capabilities via > > KMS objects and properties, then user-space can configure the hardware via an > > atomic commit. This works similarly to the existing KMS uAPI, e.g. planes. > > > > Our proposal introduces a new "color_pipeline" plane property, and a new KMS > > object type, "COLOROP" (short for color operation). The "color_pipeline" plane > > property is an enum, each enum entry represents a color pipeline supported by > > the hardware. The special zero entry indicates that the pipeline is in > > "bypass"/"no-op" mode. For instance, the following plane properties describe a > > primary plane with 2 supported pipelines but currently configured in bypass > > mode: > > > > Plane 10 > > ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary > > ├─ … > > └─ "color_pipeline": enum {0, 42, 52} = 0 > > A bit confused, why is this an enum, and not just an immutable prop that > points at the first element? You already can disable elements with the > bypass thing, also bypassing by changing the pointers to the next node in > the graph seems a bit confusing and redundant. We want to allow multiple pipelines to exist and a plane can choose the pipeline by selecting the first element of the pipeline. The enum here lists all the possible pipelines that can be attached to the surface. > > The non-zero entries describe color pipelines as a linked list of COLOROP KMS > > objects. The entry value is an object ID pointing to the head of the linked > > list (the first operation in the color pipeline). > > > > The new COLOROP objects also expose a number of KMS properties. Each has a > > type, a reference to the next COLOROP object in the linked list, and other > > type-specific properties. Here is an example for a 1D LUT operation: > > Ok no comments from me on the actual color operations and semantics of all > that, because I have simply nothing to bring to that except confusion :-) > > Some higher level thoughts instead: > > - I really like that we just go with graph nodes here. I think that was > bound to happen sooner or later with kms (we almost got there with > writeback, and with hindsight maybe should have). > > - Since there's other use-cases for graph nodes (maybe scaler modes, or > histogram samplers for adaptive backglight, or blending that goes beyond > the stacked alpha blending we have now) it think we should make this all > fairly generic: > * Add a new graph node kms object type. > * Add a class type so that userspace knows which graph nodes it must > understand for a feature (like "ColorOp" on planes here), and which it > can ignore (like perhaps a scaler node to control the interpolation) > * Probably need to adjust the object property type. Currently that > accept any object of a given type (crtc, fb, blob are the major ones). > I think for these graph nodes we want an explicit enumeration of the > possible next objects. In kms thus far we've done that with the > separate possible_* mask properties, but they're cumbersome. > * It sounds like for now we only have immutable next pointers, so that > would simplify the first iteration, but should probably anticipate all > this. Just to be clear: right now we don't expect any pipeline to be a graph but only linked lists. It probably doesn't hurt to generalize this to graphs but that's not what we want to do here (for now). > - I think the graph node should be built on top of the driver private > atomic obj/state stuff, and could then be further subclassed for > specific types. It's a bit much stacking, but avoids too much wheel > reinventing, and the worst boilerplate can be avoided with some macros > that combine the pointer chasing with the containter_of upcast. With > that you can easily build some helpers to walk the graph for a crtc or > plane or whatever really. > > - I guess core atomic code should at least do the graph link validation > and basic things like that, probably not really more to do. And > validating the standard properties on some graph nodes ofc. > > - I have no idea how we should support the standardization of the state > structures. Doing a separate subclass for each type sounds extremely > painful, but unions otoh are ugly. Ideally type-indexed and type safe > union but C isn't good enough for that. I do think that we should keep > up the goal that standard properties are decoded into state structures > in core atomic code, and not in each implementation individaully. > > - I think the only other precendent for something like this is the media > control api in the media subystem. I think it'd be really good to get > someone like Laurent to ack the graph node infrastructure to make sure > we're missing any lesson they've learned already. If there's anything > else we should pull these folks in too ofc. > > For merge plan I dropped some ideas already on Harry's rfc for > vendor-private properties, the only thing to add is that we might want to > type up the consensus plan into a merged doc like > Documentation/gpu/rfc/hdr-plane.rst or whatever you feel like for a name. > > Cheers, Daniel > > > > > > Color operation 42 > > ├─ "type": enum {Bypass, 1D curve} = 1D curve > > ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT > > ├─ "lut_size": immutable range = 4096 > > ├─ "lut_data": blob > > └─ "next": immutable color operation ID = 43 > > > > To configure this hardware block, user-space can fill a KMS blob with 4096 u32 > > entries, then set "lut_data" to the blob ID. Other color operation types might > > have different properties. > > > > Here is another example with a 3D LUT: > > > > Color operation 42 > > ├─ "type": enum {Bypass, 3D LUT} = 3D LUT > > ├─ "lut_size": immutable range = 33 > > ├─ "lut_data": blob > > └─ "next": immutable color operation ID = 43 > > > > And one last example with a matrix: > > > > Color operation 42 > > ├─ "type": enum {Bypass, Matrix} = Matrix > > ├─ "matrix_data": blob > > └─ "next": immutable color operation ID = 43 > > > > [Simon note: having "Bypass" in the "type" enum, and making "type" mutable is > > a bit weird. Maybe we can just add an "active"/"bypass" boolean property on > > blocks which can be bypassed instead.] > > > > [Jonas note: perhaps a single "data" property for both LUTs and matrices > > would make more sense. And a "size" prop for both 1D and 3D LUTs.] > > > > If some hardware supports re-ordering operations in the color pipeline, the > > driver can expose multiple pipelines with different operation ordering, and > > user-space can pick the ordering it prefers by selecting the right pipeline. > > The same scheme can be used to expose hardware blocks supporting multiple > > precision levels. > > > > That's pretty much all there is to it, but as always the devil is in the > > details. > > > > First, we realized that we need a way to indicate where the scaling operation > > is happening. The contents of the framebuffer attached to the plane might be > > scaled up or down depending on the CRTC_W and CRTC_H properties. Depending on > > the colorspace scaling is applied in, the result will be different, so we need > > a way for the kernel to indicate which hardware blocks are pre-scaling, and > > which ones are post-scaling. We introduce a special "scaling" operation type, > > which is part of the pipeline like other operations but serves an informational > > role only (effectively, the operation cannot be configured by user-space, all > > of its properties are immutable). For example: > > > > Color operation 43 > > ├─ "type": immutable enum {Scaling} = Scaling > > └─ "next": immutable color operation ID = 44 > > > > [Simon note: an alternative would be to split the color pipeline into two, by > > having two plane properties ("color_pipeline_pre_scale" and > > "color_pipeline_post_scale") instead of a single one. This would be similar to > > the way we want to split pre-blending and post-blending. This could be less > > expressive for drivers, there may be hardware where there are dependencies > > between the pre- and post-scaling pipeline?] > > > > Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware > > contains some fixed-function blocks which convert from LMS to ICtCp and cannot > > be disabled/bypassed. NVIDIA hardware has been designed for descriptive APIs > > where user-space provides a high-level description of the colorspace > > conversions it needs to perform, and this is at odds with our KMS uAPI > > proposal. To address this issue, we suggest adding a special block type which > > describes a fixed conversion from one colorspace to another and cannot be > > configured by user-space. Then user-space will need to accomodate its pipeline > > for these special blocks. Such fixed hardware blocks need to be well enough > > documented so that they can be implemented via shaders. > > > > We also noted that it should always be possible for user-space to completely > > disable the color pipeline and switch back to bypass/identity without a > > modeset. Some drivers will need to fail atomic commits for some color > > pipelines, in particular for some specific LUT payloads. For instance, AMD > > doesn't support curves which are too steep, and Intel doesn't support curves > > which decrease. This isn't something which routinely happens, but there might > > be more cases where the hardware needs to reject the pipeline. Thus, when > > user-space has a running KMS color pipeline, then hits a case where the > > pipeline cannot keep running (gets rejected by the driver), user-space needs to > > be able to immediately fall back to shaders without any glitch. This doesn't > > seem to be an issue for AMD, Intel and NVIDIA. > > > > This uAPI is extensible: we can add more color operations, and we can add more > > properties for each color operation type. For instance, we might want to add > > support for Intel piece-wise linear (PWL) 1D curves, or might want to advertise > > the effective precision of the LUTs. The uAPI is deliberately somewhat minimal > > to keep the scope of the proposal manageable. > > > > Later on, we plan to re-use the same machinery for post-blending color > > pipelines. There are some more details about post-blending which have been > > separately debated at the hackfest, but we believe it's a viable plan. This > > solution would supersede the existing DEGAMMA_LUT/CTM/GAMMA_LUT properties, so > > we'd like to introduce a client cap to hide the old properties and show the new > > post-blending color pipeline properties. > > > > We envision a future user-space library to translate a high-level descriptive > > color pipeline into low-level prescriptive KMS color pipeline ("libliftoff but > > for color pipelines"). The library could also offer a translation into shaders. > > This should help share more infrastructure between compositors and ease KMS > > offloading. This should also help dealing with the NVIDIA case. > > > > To wrap things up, let's take a real-world example: how would gamescope [2] > > configure the AMD DCN 3.0 hardware for its color pipeline? The gamescope color > > pipeline is described in [3]. The AMD DCN 3.0 hardware is described in [4]. > > > > AMD would expose the following objects and properties: > > > > Plane 10 > > ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary > > └─ "color_pipeline": enum {0, 42} = 0 > > Color operation 42 (input CSC) > > ├─ "type": enum {Bypass, Matrix} = Matrix > > ├─ "matrix_data": blob > > └─ "next": immutable color operation ID = 43 > > Color operation 43 > > ├─ "type": enum {Scaling} = Scaling > > └─ "next": immutable color operation ID = 44 > > Color operation 44 (DeGamma) > > ├─ "type": enum {Bypass, 1D curve} = 1D curve > > ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB > > └─ "next": immutable color operation ID = 45 > > Color operation 45 (gamut remap) > > ├─ "type": enum {Bypass, Matrix} = Matrix > > ├─ "matrix_data": blob > > └─ "next": immutable color operation ID = 46 > > Color operation 46 (shaper LUT RAM) > > ├─ "type": enum {Bypass, 1D curve} = 1D curve > > ├─ "1d_curve_type": enum {LUT} = LUT > > ├─ "lut_size": immutable range = 4096 > > ├─ "lut_data": blob > > └─ "next": immutable color operation ID = 47 > > Color operation 47 (3D LUT RAM) > > ├─ "type": enum {Bypass, 3D LUT} = 3D LUT > > ├─ "lut_size": immutable range = 17 > > ├─ "lut_data": blob > > └─ "next": immutable color operation ID = 48 > > Color operation 48 (blend gamma) > > ├─ "type": enum {Bypass, 1D curve} = 1D curve > > ├─ "1d_curve_type": enum {LUT, sRGB, PQ, …} = LUT > > ├─ "lut_size": immutable range = 4096 > > ├─ "lut_data": blob > > └─ "next": immutable color operation ID = 0 > > > > To configure the pipeline for an HDR10 PQ plane (path at the top) and a HDR > > display, gamescope would perform an atomic commit with the following property > > values: > > > > Plane 10 > > └─ "color_pipeline" = 42 > > Color operation 42 (input CSC) > > └─ "matrix_data" = PQ → scRGB (TF) > > Color operation 44 (DeGamma) > > └─ "type" = Bypass > > Color operation 45 (gamut remap) > > └─ "matrix_data" = scRGB (TF) → PQ > > Color operation 46 (shaper LUT RAM) > > └─ "lut_data" = PQ → Display native > > Color operation 47 (3D LUT RAM) > > └─ "lut_data" = Gamut mapping + tone mapping + night mode > > Color operation 48 (blend gamma) > > └─ "1d_curve_type" = PQ > > > > I hope comparing these properties to the diagrams linked above can help > > understand how the uAPI would be used and give an idea of its viability. > > > > Please feel free to provide feedback! It would be especially useful to have > > someone familiar with Arm SoCs look at this, to confirm that this proposal > > would work there. > > > > Unless there is a show-stopper, we plan to follow up this RFC with > > implementations for AMD, Intel, NVIDIA, gamescope, and IGT. > > > > Many thanks to everybody who contributed to the hackfest, on-site or remotely! > > Let's work together to make this happen! > > > > Simon, on behalf of the hackfest participants > > > > [1]: https://wiki.gnome.org/Hackfests/ShellDisplayNext2023 > > [2]: https://github.com/ValveSoftware/gamescope > > [3]: https://github.com/ValveSoftware/gamescope/blob/5af321724c8b8a29cef5ae9e31293fd5d560c4ec/src/docs/Steam%20Deck%20Display%20Pipeline.png > > [4]: https://kernel.org/doc/html/latest/_images/dcn3_cm_drm_current.svg > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch >