Hi, On 22 January 2016 at 16:21, Daniel Vetter <daniel@xxxxxxxx> wrote: > On Fri, Jan 22, 2016 at 04:06:15PM +0000, Lionel Landwerlin wrote: >> On 22/01/16 15:04, Daniel Stone wrote: >> >Now with everything just using split-gamma mode, I'm much happier with >> >how this is looking. I took a look at some other architectures to see >> >how this would fit, and also had a chat with Richard Hughes to clear >> >some things up. AMD seems to support every possible mode under the >> >sun, so should support any API we came up with. Most other >> >architectures only implemented a single gamma table (equivalent to >> >legacy gamma ramp), but there was one I have fairly detailed >> >documentation for and also supports everything. >> >> There might be some interest from others to have a single 12bits post csc >> gamma LUT. >> So I was going to submit another serie to enable this as a special case just >> for some generations of the Intel hardware once this work lands. >> >> Obviously in order to program the hardware in that mode you would need to a >> userspace specially tuned for the particular platform on which it's running. >> This mode wouldn't be exposed through the current set of properties. > > Yeah, the idea there is that we don't tell this userspace explicitly, but > it can be used, i.e. > a) in the config properties we advertise the sizes of the split gamma > table > b) but when userspace supplies an atomic request that matches the larger > 12bit gamma, and no pre-ctm gamma, then we'll use that. Right, exactly. My concern was mainly that it seemed incredibly easy to program broken state (12-bit LUT seemingly active both pre- and post-CTM). I honestly wasn't sure whether the code or my expectations were wrong though. But adding it back properly is just fine. >> >Intel supports two quite fun properties of matrix output. Firstly, the >> >range is (-3.0..3.0) rather than the (0.0..1.0) you might expect. >> >Negative values are axis-mirrored, i.e. lut2_index = >> >fabs(matrix_output), thus giving us a LUT range of (0.0..3.0). >> >Secondly, whilst (0.0..1.0) is represented by linear LUT entries, the >> >LUT values for (1.0..3.0) are calculated by a linear interpolation >> >between LUT entry #512 (i.e. that for 1.0) and a bonus entry #513 >> >(value for 3.0). I haven't seen this supported anywhere else, so would >> >tend towards mirroring the last value into the extra supernumerary >> >entry, i.e. emulate saturation for matrix output values to 1.0. >> >> It's good you mention this, because I wrote a test assuming negative values >> would be clamped to (0.0..1.0) and the test passes :/ >> So maybe there is something fishy here... Hmm, the doc explicitly mentions the axis-mirroring/sign-stripping, but is either so weird or so incorrect that I think you'd probably only be able to tell empirically. ;) >> >I don't really know what to do about negative values as CTM output, >> >since the doc I have here is silent on whether negative values are >> >similarly axis-mirrored/sign-stripped, or whether they are instead >> >clamped to 0.0. Either way, I'm not really sure it's behaviour we can >> >rely upon to be portable. >> >> Maybe we should compute both and verify at least one works? Well, I'd have to write the entire CM support for that platform first, which might be a little while ... ;) >> We also discussed briefly the multiplication results. My tests show the >> Intel hardware seems to clamp the results. >> Let's say you have a 255 pixel going through a 0.5 coefficient, you should >> get 127.5 which would be rounded to 128. >> Intel hardware gives us 127. >> Same for a pixel at 255 and a coefficient of 0.25. You would get 63.75 so 64 >> rounded and my results show 63. >> >> Again, computing both hypothesis might be a solution. The docs I have here are silent on what happens. Not much we can do about it really, so I think we can just leave it as platform-dependent. >> >As a detail, the architecture I'm looking at has mixed granularity for >> >the second (post-CTM) LUT: lower RGB-value entries have higher >> >granularity (precision in LUT indexing), with lower granularity for >> >higher entries. I don't think this is a problem though, since we can >> >just decimate in the kernel (i.e. ignore every n'th LUT entry, to >> >write a smaller LUT to hardware than we received to userspace). > > This was the original plan (iirc one of the byt luts is like this too). Cool; it's the only thing I could think of which made any sense. >> >Anyway, beyond that, it seems there are a few things we agree on: >> > - optional pre-matrix ('degamma') per-channel LUT of variable >> >length, but (from a userspace point of view) fixed precision, input & >> >output ranges 0.0..1.0 >> > - optional 3x3 matrix with input range [0.0,1.0], with output values >> >saturating to 1.0, and negative values producing undefined behaviour >> > - optional post-matrix ('gamma') per-channel LUT of variable length, >> >but (from a userspace point of view) fixed precision, input & output >> >ranges 0.0..1.0 > > Yeah I thought same here, we'll support 0.0..1.0 and anything that > overshoots is platform-defined. I have no idea what to do with the > overshoot lut table entries, so I guess we'll just leave those until > someone screams badly ;-) Program the overshoot register (LUT entry #513) to the same as the final LUT entry (#512), and overshoot becomes saturation, i.e. desired semantics. Not really sure what to do with undershoot (clamp to 0 or mirror axes), but given that's completely hardware dependent and there doesn't seem much we can do about it, I'd leave it at documenting 'don't do that'. Cheers, Daniel _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel