RE: DRM/KMS PWL API Thoughts and Questions

"Shankar, Uma" <uma.shankar@xxxxxxxxx> · Mon, 13 Jun 2022 21:05:05 +0000

> -----Original Message-----
> From: Harry Wentland <harry.wentland@xxxxxxx>
> Sent: Saturday, June 11, 2022 1:59 AM
> To: Pekka Paalanen <ppaalanen@xxxxxxxxx>; sebastian@xxxxxxxxxxxxxxxxx;
> Shankar, Uma <uma.shankar@xxxxxxxxx>
> Cc: Vitaly Prosyak <vitaly.prosyak@xxxxxxx>; Sharma, Shashank
> <Shashank.Sharma@xxxxxxx>; Lakha, Bhawanpreet
> <Bhawanpreet.Lakha@xxxxxxx>; Deucher, Alexander
> <Alexander.Deucher@xxxxxxx>; Alex Hung <alex.hung@xxxxxxx>; dri-devel <dri-
> devel@xxxxxxxxxxxxxxxxxxxxx>; amd-gfx list <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; intel-
> gfx@xxxxxxxxxxxxxxxxxxxxx; Modem, Bhanuprakash
> <bhanuprakash.modem@xxxxxxxxx>
> Subject: DRM/KMS PWL API Thoughts and Questions
> 
> (I'm sending this as an email as lowest common denominator but feel an issue on the
> color-and-hdr repo would be a better interface for productive discussion. Please pop
> over to https://gitlab.freedesktop.org/pq/color-and-hdr/-/issues/10 if you agree.
> Hopefully we can drive the discussion there but if there is a strong preference for
> email that works as well. :) )
> 
> I've wanted to start a thread to discuss the use of PWL APIs that were introduced by
> Uma a year ago and for which Bhanuprakash provided IGT tests. I have come to like
> the API but as we're getting closer to a real-world use of it I have a few questions
> and comments. As with a lot of complex APIs the devil is in the details. Some of
> those details are currently underspecified, or underdocumented and it's important
> that we all interpret the API the same way.

Thanks Harry for starting this thread. Adding Ville to the discussion as well as the original design
is his brain child.

We can discuss on the issue you created if all others agree.

> **The API**
> 
> The original patches posted by Uma:
> https://patchwork.freedesktop.org/series/90822/
> https://patchwork.freedesktop.org/series/90826/
> 
> The IGT tests for PWL API:
> https://patchwork.freedesktop.org/series/96895/
> 
> I've rebased the kernel patches on a slightly more recent kernel, along with an AMD
> implementation:
> https://gitlab.freedesktop.org/hwentland/linux/-/tree/color-and-hdr
> 
> I've also rebased them on an IGT tree, but that's not too up-to-date:
> https://gitlab.freedesktop.org/hwentland/igt-gpu-tools/-/tree/color-and-hdr
> 
> 
> **Why do I like the API?**
> 
> In order to allow HW composition of HDR planes in linear space we need the ability
> to program at least a per-CRTC regamma (inv_EOTF) to go from linear to wire
> format post-blending. Since userspace might want to apply corrections on top of a
> simple transfer function (such as PQ, BT.709, etc.) it would like a way to set a custom
> LUT.
> 
> The existing CRTC gamma LUT defines equally spaced entries. As Pekka shows in [1]
> equally-spaced LUTs have unacceptable error for regamma/inv_EOTF. Hence we
> need finer granularity of our entries near zero while coarse granularity works fine
> toward the brighter values.
> 
> [1] https://gitlab.freedesktop.org/pq/color-and-hdr/-/merge_requests/9
> 
> HW (at least AMD and Intel HW) implements this ability as segmented piece-wise
> linear LUTs. These define segments of equally spaced entries. These segments are
> constrained by the HW implementation. I like how the PWL API allows different
> drivers to specify the constraints imposed by different HW while allowing userspace
> a generic way of parsing the PWL. This also avoids complex calculations in the kernel
> driver, which might be required for other APIs one could envision. If anyone likes I
> can elaborate on some ideas for an alternate API, though all of them will require
> non-trivial transformations by the kernel driver in order to program them to HW.
> 

I feel the current design fits best to most of the implementations, even legacy API/hardware
can be defined using the proposed UAPI's. Its good that we are in agreement and we can surely
iron out the UAPI and document the expectations.

> **Nitpicks**
> 
> The API defines everything inside the segments, including flags and values that apply
> to the entire PWL, such as DRM_MODE_LUT_GAMMA,
> DRM_MODE_LUT_REFLECT_NEGATIVE, input_bpc, and output_bpc. If these don't
> stay constant for segments it might complicate the interpretation of segments. I
> suggest we consider these as effectively applying to the entire PWL. We could
> encode them in an overall drm_color_lut struct that includes an array of
> drm_color_lut_range but that's probably not necessary, hence why I called this out
> as a nitpick. I would just like us to be aware of this ambiguity and document that
> these values applies to the entire PWL.

It can be done, no concerns as such.

> 
> **How to read the PWL**
> 
> Let me first give a summary for how this LUT is used in userspace. If you're familiar
> with this please review and comment if I got things wrong. As I mentioned, a lot of
> this is underspecified at the moment so you're reading my interpretation.
> 
> You can see this behavior in plane_degamma_test [2] in the kms_color.c IGT test
> suite. I suggest the plane_degamma_test here here instead of the test_pipe_gamma
> test as the latter still has Intelisms (assumptions around Intel driver/HW behavior)
> and will not work for other drivers.
> 
> Iterate over all enums in PLANE_DEGAMMA_MODE and find a suitable one. How do
> we find the suitable one? More on that below.
> 
> Once we have the right PLANE_DEGAMMA_MODE we read the blob for the blob ID
> associated with the PLANE_DEGAMMA_MODE enum. We interpret the blob as an
> array of drm_color_lut_range. See get_segment_data [3].
> 
> We can think of our LUT/PWL as f(x) = y. For a traditional equally spaced LUT with
> 1024 entries x would be 0, 1, 2, ..., 1023. For a PWL LUT we need to parse the
> segment data provided in drm_color_lut_range.
> 
> Let's look at the 2nd-last entry of the nonlinear_pwl definition for the AMD driver [4]
> (I've correct it here and dropped the DRM_MODE_LUT_REUSE_LAST but it's still
> incorrect in the link) and simplify it to 4 entries for sake of readability:
> 
> {
>         .flags = (DRM_MODE_LUT_GAMMA | DRM_MODE_LUT_REFLECT_NEGATIVE |
> DRM_MODE_LUT_INTERPOLATE | DRM_MODE_LUT_NON_DECREASING),
>         .count = 4,
>         .input_bpc = 13, .output_bpc = 18,
>         .start = 1 << 12, .end = 1 << 13,
>         .min = 0, .max = 1UL << 35
>     },
> 
> We see we have 16 entries in the region from (1 << 12) to (1 << 13). We see
> input_bpc is 13, so our 1.0 float value is 1 << 13.
> 
> (Is it sensible to use input_bpc as our 1.0 floating point reference? Why?)
> 
> Since this segment is not reusing the last entry (doesn't have
> DRM_MODE_LUT_REUSE_LAST) we divide the region between 1 << 12 and 1 << 13
> into 4 equally spaced sections.
> 
> step_size = (segment->end - segment->start) / count
> 
> In our case our segment spans from 4096 to 8192 and yields a step_size of 1024.
> 
> Note that we need to calculate this in floating point, otherwise we're not
> guaranteed equal spacing.
> 
> This gives us these X entries for this particular segment:
> 4096, 5120, 6144, 7168
> 
> And normalized to 1 << 13 (input_bpc) for our 1.0 float value we get 0.5, 0.625, 0.75,
> 0.875
> 
> If the segment had the REUSE_LAST flag the values would look like 4096, 5461,
> 6826, 8192
> 
> and normalized to
> 0.5, 0.666626, 0.833252, 1
> 

This understanding and explanation aligns with mine. @Ville Syrjälä Any thoughts or inputs here ?

> Though in the case of REUSE_LAST a more sensible definition might be with a count
> of 5 entries in this segment, instead of 16. But ultimatly that's up to the driver.
> 
> I attached a simple C program that parses a PWL and helps illustrate my
> interpretation. You'll just need to copy-paste or include your PWL definition (instead
> of color_gamma.h), and point the PWL define to your PWL variable. To build it you'll
> need to copy the drm-uapi folder from my IGT repo into the same directory as pwl-
> parser.c and then just build it with "gcc -Idrm-uapi pwl-parser.c".
> 
> The above entries come from running pwl-parser with the nonlinear_pwl from [5].
> 
> To illustrate these I added versions of your lut1d_error scripts that run on the SDR
> and HDR PWLs for AMD HW [6].
> 
> We should probably document the above in detail in the DRM/KMS API docs.
> 
> [2] https://gitlab.freedesktop.org/hwentland/igt-gpu-tools/-/blob/color-and-
> hdr/tests/kms_color.c#L978
> [3] https://gitlab.freedesktop.org/hwentland/igt-gpu-tools/-/blob/color-and-
> hdr/tests/kms_color_helper.c#L393
> [4] https://gitlab.freedesktop.org/hwentland/linux/-/blob/color-and-
> hdr/drivers/gpu/drm/amd/display/modules/color/color_gamma.h#L109
> [5] https://gitlab.freedesktop.org/hwentland/linux/-/blob/color-and-
> hdr/drivers/gpu/drm/amd/display/modules/color/color_gamma.h#L38
> [6] https://gitlab.freedesktop.org/hwentland/color-and-hdr/-/tree/precision/octave
> 
> **How to provide PWL entries?**
> 
> To provide the PWL values for each entry we'll have to sample our (custom) curve at
> the respective points specified by our PWL entries and providing those samples in a
> blob that is passed to PLANE_DEGAMMA_LUT. It's not much different from how we
> provide values for an equally spaced (legacy) LUT. As for sampling our curve, I
> remember seeing that Weston uses an LCMS function to sample the curve at
> required points. Last I checked it samples the curve at evenly spaced intervals but the
> LCMS function seemed to provide arbitrary sampling.
> 
> 
> **How to pick a suitable PWL definition?**
> 

It would be good to expand the term PWL and what we mean by it just to make it clear
to all. I assume you mean piece wise linear

> Picking the right PWL definition out of the respective \_MODE enum isn't trivial.
> Without further information a userspace implementer has to understand the
> distribution of entries in all segments and perform a bunch of math to estimate the
> error for given curves.
> 
> A simpler approach might be if we defined common naming for our PWL enums. We
> can define the commonly expected cases. The two important parameters are within-
> range vs overrange entries, and linear vs non-linear outputs.
> 

This was a bit different on Intel hardware, but we can surely generalize. We can make one
of the modes as default so user can just chose that in case no other useful mechanism to
pick a mode.

Having said that, once user gets the segment blob info for a respective mode, it should be
able to make a call to choose the most efficient one for its use case (based on number of segment,
range and number of lut entries)

> Within-range PWLs would cover [0.0, 1.0] entries and overrange [0.0, 128.0] to
> cover PQ and probably anything else. One could think of the within-range PWL as
> intended for SDR content and the over-range PWL for HDR content.
> 
> Linear PWLs would be intended for linear luminance outputs (or near-linear), and can
> be represented by equally spaced LUTs, such as a single-segment definitions. Non-
> linear PWLs would be intended for linear to non-linear transforms; Linear PWLs for
> non-linear to linear transforms or OOTFs.
> 
> This gives us four enums, plus one for bypass:
> DRM_MODE_LUT_BYPASS
> DRM_MODE_LUT_LINEAR_SDR
> DRM_MODE_LUT_NONLINEAR_SDR
> DRM_MODE_LUT_LINEAR_HDR
> DRM_MODE_LUT_NONLINEAR_HDR
> 
> Drivers can provide appropriate PWLs based on the HW caps. Userspace can pick an
> appropriate one if it's available or fall back to either a sub-optimal PWL or to using a
> GPU transform instead.

Looks fine to me.

Regards,
Uma Shankar
> 
> Thoughts?
> 
> Thanks,
> Harry