Re: [RFC] Plane color pipeline KMS uAPI

Joshua Ashton <joshua@xxxxxxxxx> · Fri, 5 May 2023 18:01:58 +0100

On 5/5/23 15:16, Pekka Paalanen wrote:
On Fri, 5 May 2023 14:30:11 +0100
Joshua Ashton <joshua@xxxxxxxxx> wrote:

Some corrections and replies inline.

On Fri, 5 May 2023 at 12:42, Pekka Paalanen <ppaalanen@xxxxxxxxx> wrote:

On Thu, 04 May 2023 15:22:59 +0000
Simon Ser <contact@xxxxxxxxxxx> wrote:

...

To wrap things up, let's take a real-world example: how would gamescope [2]
configure the AMD DCN 3.0 hardware for its color pipeline? The gamescope color
pipeline is described in [3]. The AMD DCN 3.0 hardware is described in [4].

AMD would expose the following objects and properties:

     Plane 10
     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
     └─ "color_pipeline": enum {0, 42} = 0
     Color operation 42 (input CSC)
     ├─ "type": enum {Bypass, Matrix} = Matrix
     ├─ "matrix_data": blob
     └─ "next": immutable color operation ID = 43
     Color operation 43
     ├─ "type": enum {Scaling} = Scaling
     └─ "next": immutable color operation ID = 44
     Color operation 44 (DeGamma)
     ├─ "type": enum {Bypass, 1D curve} = 1D curve
     ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB
     └─ "next": immutable color operation ID = 45

Some vendors have per-tap degamma and some have a degamma after the sample.
How do we distinguish that behaviour?
It is important to know.

...

Btw. ISTR that if you want to do scaling properly with alpha channel,
you need optical values multiplied by alpha. Alpha vs. scaling is just
yet another thing to look into, and TF operations do not work with
pre-mult.

What are your concerns here?

I believe this is exactly the same question as yours about sampling, at
least for up-scaling where sampling the framebuffer interpolates in
some way.

Oh, interpolation mode would fit in the scaling COLOROP...

Having pre-multiplied alpha is fine with a TF: the alpha was
premultiplied in linear, then encoded with the TF by the client.

There are two different ways to pre-multiply: into optical values
(okay), and into electrical values (what everyone actually does, and
what Wayland assumes by default).

What you described is the thing mostly no-one does in GUI graphics.
Even in the web.

Yeah, I have seen this problem many times before in different fields.

There are not many transparent clients that I know of (most of them are 
Gamescope Overlays), but the ones I do know of do actually do the 
premultiply in linear space (mainly because they use sRGB image views 
for their color attachments so it gets handled for them).

From my perspective and experience, we definitely shouldn't do anything 
to try and 'fix' apps doing their premultiply in the wrong space.

I've had to deal with this before in game development on a transparent 
HUD, and my solution and thinking for that was:
It was authored (or "mastered") with this behaviour in mind. So that's 
what we should do.
It felt bad to 'break' the blending on the HUD of that game, but it 
looked better, and it was what was intended before it was 'fixed' in a 
later engine version.

It is still definitely interesting to think about, but I don't think 
presents a problem at all.
In fact, doing anything would just 'break' the expected behaviour of apps.

If you think of a TF as something something relative to a bunch of
reference state or whatever then you might think "oh you can't do
that!", but you really can.
It's really best to just think of it as a mathematical encoding of a
value in all instances that we touch.

True, except when it's false. If you assume that decoding is the exact
mathematical inverse of encoding, then your conclusion follows.

Unfortunately many video standards do not have it so. BT.601, BT.709,
and I forget if BT.2020 (SDR) as well encode with one function and
decode with something that is not the inverse, and it is totally
intentional and necessary mangling of the values to get the expected
result on screen. Someone has called this "implicit color management".

So one needs to be very careful here what the actual characteristics
are.

The only issue is that you lose precision from having pre-multiplied
alpha as it's quantized to fit into the DRM format rather than using
the full range then getting divided by the alpha at blend time.
It doesn't end up being a visible issue ever however in my experience, at 8bpc.

That's true. Wait, why would you divide by alpha for blending?
Blending/interpolation is the only operation where pre-mult is useful.

I mis-spoke, I meant multiply.

- Joshie 🐸✨

Thanks,
pq

Thanks
  - Joshie 🐸✨

Thanks,
pq

I hope comparing these properties to the diagrams linked above can help
understand how the uAPI would be used and give an idea of its viability.

Please feel free to provide feedback! It would be especially useful to have
someone familiar with Arm SoCs look at this, to confirm that this proposal
would work there.

Unless there is a show-stopper, we plan to follow up this RFC with
implementations for AMD, Intel, NVIDIA, gamescope, and IGT.

Many thanks to everybody who contributed to the hackfest, on-site or remotely!
Let's work together to make this happen!

Simon, on behalf of the hackfest participants

[1]: https://wiki.gnome.org/Hackfests/ShellDisplayNext2023
[2]: https://github.com/ValveSoftware/gamescope
[3]: https://github.com/ValveSoftware/gamescope/blob/5af321724c8b8a29cef5ae9e31293fd5d560c4ec/src/docs/Steam%20Deck%20Display%20Pipeline.png
[4]: https://kernel.org/doc/html/latest/_images/dcn3_cm_drm_current.svg