On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote: > On Wed, 25 Oct 2023 15:16:08 -0500 (CDT) > Alex Goins <agoins@xxxxxxxxxx> wrote: > > > Thank you Harry and all other contributors for your work on this. Responses > > inline - > > > > On Mon, 23 Oct 2023, Pekka Paalanen wrote: > > > > > On Fri, 20 Oct 2023 11:23:28 -0400 > > > Harry Wentland <harry.wentland@xxxxxxx> wrote: > > > > > > > On 2023-10-20 10:57, Pekka Paalanen wrote: > > > > > On Fri, 20 Oct 2023 16:22:56 +0200 > > > > > Sebastian Wick <sebastian.wick@xxxxxxxxxx> wrote: > > > > > > > > > >> Thanks for continuing to work on this! > > > > >> > > > > >> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote: > > > > >>> v2: > > > > >>> - Update colorop visualizations to match reality (Sebastian, Alex Hung) > > > > >>> - Updated wording (Pekka) > > > > >>> - Change BYPASS wording to make it non-mandatory (Sebastian) > > > > >>> - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property > > > > >>> section (Pekka) > > > > >>> - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa) > > > > >>> - Add "Driver Implementer's Guide" section (Pekka) > > > > >>> - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka) > > > > > > > > > > ... > > > > > > > > > >>> +An example of a drm_colorop object might look like one of these:: > > > > >>> + > > > > >>> + /* 1D enumerated curve */ > > > > >>> + Color operation 42 > > > > >>> + ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D enumerated curve > > > > >>> + ├─ "BYPASS": bool {true, false} > > > > >>> + ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF, PQ inverse EOTF, …} > > > > >>> + └─ "NEXT": immutable color operation ID = 43 > > > > I know these are just examples, but I would also like to suggest the possibility > > of an "identity" CURVE_1D_TYPE. BYPASS = true might get different results > > compared to setting an identity in some cases depending on the hardware. See > > below for more on this, RE: implicit format conversions. > > > > Although NVIDIA hardware doesn't use a ROM for enumerated curves, it came up in > > offline discussions that it would nonetheless be helpful to expose enumerated > > curves in order to hide the vendor-specific complexities of programming > > segmented LUTs from clients. In that case, we would simply refer to the > > enumerated curve when calculating/choosing segmented LUT entries. > > That's a good idea. > > > Another thing that came up in offline discussions is that we could use multiple > > color operations to program a single operation in hardware. As I understand it, > > AMD has a ROM-defined LUT, followed by a custom 4K entry LUT, followed by an > > "HDR Multiplier". On NVIDIA we don't have these as separate hardware stages, but > > we could combine them into a singular LUT in software, such that you can combine > > e.g. segmented PQ EOTF with night light. One caveat is that you will lose > > precision from the custom LUT where it overlaps with the linear section of the > > enumerated curve, but that is unavoidable and shouldn't be an issue in most > > use-cases. > > Indeed. > > > Actually, the current examples in the proposal don't include a multiplier color > > op, which might be useful. For AMD as above, but also for NVIDIA as the > > following issue arises: > > > > As discussed further below, the NVIDIA "degamma" LUT performs an implicit fixed > > point to FP16 conversion. In that conversion, what fixed point 0xFFFFFFFF maps > > to in floating point varies depending on the source content. If it's SDR > > content, we want the max value in FP16 to be 1.0 (80 nits), subject to a > > potential boost multiplier if we want SDR content to be brighter. If it's HDR PQ > > content, we want the max value in FP16 to be 125.0 (10,000 nits). My assumption > > is that this is also what AMD's "HDR Multiplier" stage is used for, is that > > correct? > > It would be against the UAPI design principles to tag content as HDR or > SDR. What you can do instead is to expose a colorop with a multiplier of > 1.0 or 125.0 to match your hardware behaviour, then tell your hardware > that the input is SDR or HDR to get the expected multiplier. You will > never know what the content actually is, anyway. > > Of course, if we want to have a arbitrary multiplier colorop that is > somewhat standard, as in, exposed by many drivers to ease userspace > development, you can certainly use any combination of your hardware > features you need to realize the UAPI prescribed mathematical operation. > > Since we are talking about floating-point in hardware, a multiplier > does not significantly affect precision. > > In order to mathematically define all colorops, I believe it is > necessary to define all colorops in terms of floating-point values (as > in math), even if they operate on fixed-point or integer. By this I > mean that if the input is 8 bpc unsigned integer pixel format for > instance, 0 raw pixel channel value is mapped to 0.0 and 255 is mapped > to 1.0, and the color pipeline starts with [0.0, 1.0], not [0, 255] > domain. We have to agree on this mapping for all channels on all pixel > formats. However, there is a "but" further below. > > I also propose that quantization range is NOT considered in the raw > value mapping, so that we can handle quantization range in colorops > explicitly, allowing us to e.g. handle sub-blacks and super-whites when > necessary. (These are currently impossible to represent in the legacy > color properties, because everything is converted to full range and > clipped before any color operations.) > > > From the given enumerated curves, it's not clear how they would map to the > > above. Should sRGB EOTF have a max FP16 value of 1.0, and the PQ EOTF a max FP16 > > value of 125.0? That may work, but it tends towards the "descriptive" notion of > > assuming the source content, which may not be accurate in all cases. This is > > also an issue for the custom 1D LUT, as the blob will need to be converted to > > FP16 in order to populate our "degamma" LUT. What should the resulting max FP16 > > value be, given that we no longer have any hint as to the source content? > > In my opinion, all finite non-negative transfer functions should > operate with [0.0, 1.0] domain and [0.0, 1.0] range, and that includes > all sRGB, power 2.2, and PQ curves. > > If we look at BT.2100, there is no such encoding even mentioned where > 125.0 would correspond to 10k cd/m². That 125.0 convention already has > a built-in assumption what the color spaces are and what the conversion > is aiming to do. IOW, I would say that choice is opinionated from the > start. The multiplier in BT.2100 is always 10000. > > Given that elements like various kinds of look-up tables inherently > assume that the domain is [0.0, 1.0] (because the it is a table that > has a beginning and an end, and the usual convention is that the > beginning is zero and the end is one), I think it is best to stick to > the [0.0, 1.0] range where possible. If we go out of that range, then > we have to define how a LUT would apply in a sensible way. > > Many TFs are intended to be defined only on [0.0, 1.0] -> [0.0, 1.0]. > Some curves, like power 2.2, have a mathematical form that naturally > extends outside of that range. Power 2.2 generalizes to >1.0 input > values as is, but not for negative input values. If needed for negative > input values, it is common to use y = -TF(-x) for x < 0 mirroring. > > scRGB is the prime example that intentionally uses negative channel > values. We can also have negative channel values with limited > quantization range, sometimes even intentionally (xvYCC chroma, or > PLUGE test sub-blacks). Out-of-unit-range values can also appear as a > side-effect of signal processing, and they should not get clipped > prematurely. This is a challenge for colorops that fundamentally cannot > handle out-of-unit-range values. > > There are various workarounds. scRGB colorimetry can be converted into > BT.2020 primaries for example, to avoid saturation induced negative > values. Limited quantization range signal could be processed as-is, > meaning that the limited range is mapped to [16.0/255, 235.0/255] > instead of [0.0, 1.0] or so. But then, we have a complication with > transfer functions. > > > I think a multiplier color op solves all of these issues. Named curves and > > custom 1D LUTs would by default assume a max FP16 value of 1.0, which would then > > be adjusted by the multiplier. > > Pretty much. > > > For 80 nit SDR content, set it to 1, for 400 > > nit SDR content, set it to 5, for HDR PQ content, set it to 125, etc. > > That I think is a another story. > > > > > >>> + > > > > >>> + /* custom 4k entry 1D LUT */ > > > > >>> + Color operation 52 > > > > >>> + ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D LUT > > > > >>> + ├─ "BYPASS": bool {true, false} > > > > >>> + ├─ "LUT_1D_SIZE": immutable range = 4096 > > > > >>> + ├─ "LUT_1D": blob > > > > >>> + └─ "NEXT": immutable color operation ID = 0 > > > > > > > > > > ... > > > > > > > > > >>> +Driver Forward/Backward Compatibility > > > > >>> +===================================== > > > > >>> + > > > > >>> +As this is uAPI drivers can't regress color pipelines that have been > > > > >>> +introduced for a given HW generation. New HW generations are free to > > > > >>> +abandon color pipelines advertised for previous generations. > > > > >>> +Nevertheless, it can be beneficial to carry support for existing color > > > > >>> +pipelines forward as those will likely already have support in DRM > > > > >>> +clients. > > > > >>> + > > > > >>> +Introducing new colorops to a pipeline is fine, as long as they can be > > > > >>> +disabled or are purely informational. DRM clients implementing support > > > > >>> +for the pipeline can always skip unknown properties as long as they can > > > > >>> +be confident that doing so will not cause unexpected results. > > > > >>> + > > > > >>> +If a new colorop doesn't fall into one of the above categories > > > > >>> +(bypassable or informational) the modified pipeline would be unusable > > > > >>> +for user space. In this case a new pipeline should be defined. > > > > >> > > > > >> How can user space detect an informational element? Should we just add a > > > > >> BYPASS property to informational elements, make it read only and set to > > > > >> true maybe? Or something more descriptive? > > > > > > > > > > Read-only BYPASS set to true would be fine by me, I guess. > > > > > > > > > > > > > Don't you mean set to false? An informational element will always do > > > > something, so it can't be bypassed. > > > > > > Yeah, this is why we need a definition. I understand "informational" to > > > not change pixel values in any way. Previously I had some weird idea > > > that scaling doesn't alter color, but of course it may. > > > > On recent hardware, the NVIDIA pre-blending pipeline includes LUTs that do > > implicit fixed-point to FP16 conversions, and vice versa. > > Above, I claimed that the UAPI should be defined in nominal > floating-point values, but I wonder, would that work? Would we need to > have explicit colorops for converting from raw pixel data values into > nominal floating-point in the UAPI? > > > For example, the "degamma" LUT towards the beginning of the pipeline implicitly > > converts from fixed point to FP16, and some of the following operations expect > > to operate in FP16. As such, if you have a fixed point input and don't bypass > > those following operations, you *must not* bypass the LUT, even if you are > > otherwise just programming it with the identity. Conversely, if you have a > > floating point input, you *must* bypass the LUT. > > Interesting. Since the color pipeline is not(?) meant to replace pixel > format definitions which already make the difference between fixed and > floating point, wouldn't this little detail need to be taken care of by > the driver under the hood? > > What if I want to use degamma colorop with a floating-point > framebuffer? Simply not possible on this hardware? > > > Could informational elements and allowing the exclusion of the BYPASS property > > be used to convey this information to the client? For example, we could expose > > one pipeline with the LUT exposed with read-only BYPASS set to false, and > > sandwich it with informational "Fixed Point" and "FP16" elements to accommodate > > fixed point input. Then, expose another pipeline with the LUT missing, and an > > informational "FP16" element in its place to accommodate floating point input. > > > > That's just an example; we also have other operations in the pipeline that do > > similar implicit conversions. In these cases we don't want the operations to be > > bypassed individually, so instead we would expose them as mandatory in some > > pipelines and missing in others, with informational elements to help inform the > > client of which to choose. Is that acceptable under the current proposal? > > > > Note that in this case, the information just has to do with what format the > > pixels should be in, it doesn't correspond to any specific operation. So, I'm > > not sure that BYPASS has any meaning for informational elements in this context. > > Very good questions. Do we have to expose those conversions in the UAPI > to make things work for this hardware? Meaning that we cannot assume all > colorops work in nominal floating-point from userspace perspective > (perhaps with varying degrees of precision). I had this in my original proposal I think (maybe I only thought about it, not sure). We really should figure this one out. Can we get away with normalized [0,1] fp as a user space abstraction or not? > > > > > > I think we also need a definition of "informational". > > > > > > > > > > Counter-example 1: a colorop that represents a non-configurable > > > > > > > > Not sure what's "counter" for these examples? > > > > > > > > > YUV<->RGB conversion. Maybe it determines its operation from FB pixel > > > > > format. It cannot be set to bypass, it cannot be configured, and it > > > > > will alter color values. > > > > Would it be reasonable to expose this is a 3x4 matrix with a read-only blob and > > no BYPASS property? I already brought up a similar idea at the XDC HDR Workshop > > based on the principle that read-only blobs could be used to express some static > > pipeline elements without the need to define a new type, but got mixed opinions. > > I think this demonstrates the principle further, as clients could detect this > > programmatically instead of having to special-case the informational element. > I'm all for exposing fixed color ops but I suspect that most of those follow some standard and in those cases instead of exposing the matrix values one should prefer to expose a named matrix (e.g. BT.601, BT.709, BT.2020). As a general rule: always expose the highest level description. Going from a name to exact values is trivial, going from values to a name is much harder. > If the blob depends on the pixel format (i.e. the driver automatically > chooses a different blob per pixel format), then I think we would need > to expose all the blobs and how they correspond to pixel formats. > Otherwise ok, I guess. > > However, do we want or need to make a color pipeline or colorop > conditional on pixel formats? For example, if you use a YUV 4:2:0 type > of pixel format, then you must use this pipeline and not any other. Or > floating-point type of pixel format. I did not anticipate this before, > I assumed that all color pipelines and colorops are independent of the > framebuffer pixel format. A specific colorop might have a property that > needs to agree with the framebuffer pixel format, but I didn't expect > further limitations. We could simply fail commits when the pipeline and pixel format don't work together. We'll probably need some kind of ingress no-op node anyway and maybe could list pixel formats there if required to make it easier to find a working configuration. > "Without the need to define a new type" is something I think we need to > consider case by case. I have a hard time giving a general opinion. > > > > > > > > > > > Counter-example 2: image size scaling colorop. It might not be > > > > > configurable, it is controlled by the plane CRTC_* and SRC_* > > > > > properties. You still need to understand what it does, so you can > > > > > arrange the scaling to work correctly. (Do not want to scale an image > > > > > with PQ-encoded values as Josh demonstrated in XDC.) > > > > > > > > > > > > > IMO the position of the scaling operation is the thing that's important > > > > here as the color pipeline won't define scaling properties. > > > > I agree that blending should ideally be done in linear space, and I remember > > that from Josh's presentation at XDC, but I don't recall the same being said for > > scaling. In fact, the NVIDIA pre-blending scaler exists in a stage of the > > pipeline that is meant to be in PQ space (more on this below), and that was > > found to achieve better results at HDR/SDR boundaries. Of course, this only > > bolsters the argument that it would be helpful to have an informational "scaler" > > element to understand at which stage scaling takes place. > > Both blending and scaling are fundamentally the same operation: you > have two or more source colors (pixels), and you want to compute a > weighted average of them following what happens in nature, that is, > physics, as that is what humans are used to. > > Both blending and scaling will suffer from the same problems if the > operation is performed on not light-linear values. The result of the > weighted average does not correspond to physics. > > The problem may be hard to observe with natural imagery, but Josh's > example shows it very clearly. Maybe that effect is sometimes useful > for some imagery in some use cases, but it is still an accidental > side-effect. You might get even better results if you don't rely on > accidental side-effects but design a separate operation for the exact > goal you have. > > Mind, by scaling we mean changing image size. Not scaling color values. > > > > > > Counter-example 3: image sampling colorop. Averages FB originated color > > > > > values to produce a color sample. Again do not want to do this with > > > > > PQ-encoded values. > > > > > > > > > > > > > Wouldn't this only happen during a scaling op? > > > > > > There is certainly some overlap between examples 2 and 3. IIRC SRC_X/Y > > > coordinates can be fractional, which makes nearest vs. bilinear > > > sampling have a difference even if there is no scaling. > > > > > > There is also the question of chroma siting with sub-sampled YUV. I > > > don't know how that actually works, or how it theoretically should work. > > > > We have some operations in our pipeline that are intended to be static, i.e. a > > static matrix that converts from RGB to LMS, and later another that converts > > from LMS to ICtCp. There are even LUTs that are intended to be static, > > converting from linear to PQ and vice versa. All of this is because the > > pre-blending scaler and tone mapping operator are intended to operate in ICtCp > > PQ space. Although the stated LUTs and matrices are intended to be static, they > > are actually programmable. In offline discussions, it was indicated that it > > would be helpful to actually expose the programmability, as opposed to exposing > > them as non-bypassable blocks, as some compositors may have novel uses for them. > > Correct. Doing tone-mapping in ICtCp etc. are already policy that > userspace might or might not agree with. > > Exposing static colorops will help usages that adhere to current > prevalent standards around very specific use cases. There may be > millions of devices needing exactly that processing in their usage, but > it is also quite limiting in what one can do with the hardware. > > > Despite being programmable, the LUTs are updated in a manner that is less > > efficient as compared to e.g. the non-static "degamma" LUT. Would it be helpful > > if there was some way to tag operations according to their performance, > > for example so that clients can prefer a high performance one when they > > intend to do an animated transition? I recall from the XDC HDR workshop > > that this is also an issue with AMD's 3DLUT, where updates can be too > > slow to animate. > > I can certainly see such information being useful, but then we need to > somehow quantize the performance. > > What I was left puzzled about after the XDC workshop is that is it > possible to pre-load configurations in the background (slow), and then > quickly switch between them? Hardware-wise I mean. We could define that pipelines with a lower ID are to be preferred over higher IDs. The issue is that if programming a pipeline becomes too slow to be useful it probably should just not be made available to user space. The prepare-commit idea for blob properties would help to make the pipelines usable again, but until then it's probably a good idea to just not expose those pipelines. > > > Thanks, > pq