Re: [Intel-gfx] [PATCH 0/6] Pipe level color management

Lionel Landwerlin <lionel.g.landwerlin@xxxxxxxxx> · Fri, 22 Jan 2016 16:06:15 +0000

On 22/01/16 15:04, Daniel Stone wrote:
Hi Lionel,

On 21 January 2016 at 15:03, Lionel Landwerlin
<lionel.g.landwerlin@xxxxxxxxx> wrote:
Hi,

This serie introduces pipe level color management through a set of properties
attached to the CRTC. It also provides an implementation for some Intel
platforms.

This serie is based of a previous set of patches by Shashank Sharma and takes
into account of the comments by Daniel Stone & Daniel Vetter.
This is a lot more tractable than previous series, thanks! I think a
lot of the confusion I had around this was from the number of
hardware-specific features stuffed into this, and the manner in which
they were stuffed in. For example, with the previous series, it looks
like you could configure both PRE_CTM and POST_CTM LUTs in 12-bit
mode, which is impossible as the PRM suggests the only way to have
both LUTs active is with split-gamma mode. (For anyone else looking at
the Broadwell PRM, note the split-gamma mode describes the two LUTs
completely backwards: the only thing that makes sense is for the
pre-CTM LUT to have a range of 0..1.0, and the post-CTM LUT to have a
range of -3.0..3.0, rather than the other way around.)

Now with everything just using split-gamma mode, I'm much happier with
how this is looking. I took a look at some other architectures to see
how this would fit, and also had a chat with Richard Hughes to clear
some things up. AMD seems to support every possible mode under the
sun, so should support any API we came up with. Most other
architectures only implemented a single gamma table (equivalent to
legacy gamma ramp), but there was one I have fairly detailed
documentation for and also supports everything.

There might be some interest from others to have a single 12bits post 
csc gamma LUT.
So I was going to submit another serie to enable this as a special case 
just for some generations of the Intel hardware once this work lands.

Obviously in order to program the hardware in that mode you would need 
to a userspace specially tuned for the particular platform on which it's 
running. This mode wouldn't be exposed through the current set of 
properties.

The degamma/colour-transform-matrix/gamma model is definitely a good
one, and it seems like everyone agrees on a 3x3 matrix for CTM. So
far, so good. What I worry about is the _values_ we put into the CTM.

Intel supports two quite fun properties of matrix output. Firstly, the
range is (-3.0..3.0) rather than the (0.0..1.0) you might expect.
Negative values are axis-mirrored, i.e. lut2_index =
fabs(matrix_output), thus giving us a LUT range of (0.0..3.0).
Secondly, whilst (0.0..1.0) is represented by linear LUT entries, the
LUT values for (1.0..3.0) are calculated by a linear interpolation
between LUT entry #512 (i.e. that for 1.0) and a bonus entry #513
(value for 3.0). I haven't seen this supported anywhere else, so would
tend towards mirroring the last value into the extra supernumerary
entry, i.e. emulate saturation for matrix output values to 1.0.

It's good you mention this, because I wrote a test assuming negative 
values would be clamped to (0.0..1.0) and the test passes :/
So maybe there is something fishy here...

I don't really know what to do about negative values as CTM output,
since the doc I have here is silent on whether negative values are
similarly axis-mirrored/sign-stripped, or whether they are instead
clamped to 0.0. Either way, I'm not really sure it's behaviour we can
rely upon to be portable.

Maybe we should compute both and verify at least one works?

We also discussed briefly the multiplication results. My tests show the 
Intel hardware seems to clamp the results.
Let's say you have a 255 pixel going through a 0.5 coefficient, you 
should get 127.5 which would be rounded to 128.
Intel hardware gives us 127.
Same for a pixel at 255 and a coefficient of 0.25. You would get 63.75 
so 64 rounded and my results show 63.

Again, computing both hypothesis might be a solution.

As a detail, the architecture I'm looking at has mixed granularity for
the second (post-CTM) LUT: lower RGB-value entries have higher
granularity (precision in LUT indexing), with lower granularity for
higher entries. I don't think this is a problem though, since we can
just decimate in the kernel (i.e. ignore every n'th LUT entry, to
write a smaller LUT to hardware than we received to userspace).

Anyway, beyond that, it seems there are a few things we agree on:
   - optional pre-matrix ('degamma') per-channel LUT of variable
length, but (from a userspace point of view) fixed precision, input &
output ranges 0.0..1.0
   - optional 3x3 matrix with input range [0.0,1.0], with output values
saturating to 1.0, and negative values producing undefined behaviour
   - optional post-matrix ('gamma') per-channel LUT of variable length,
but (from a userspace point of view) fixed precision, input & output
ranges 0.0..1.0

This would mean missing some Intel-specific features, but whether or
not this is actually required I don't really know. At least it seems
like it would be enough to implement standard ICC profile correction
from Weston in a hardware-independent manner.

Thierry, Alex, did you have any comments or ideas on this?

Cheers,
Daniel

Thanks for your feedback!

-
Lionel
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel