Re: [PATCH] tests/kms_ccs: Fix the color/ccs surface generation

Jason Ekstrand <jason@xxxxxxxxxxxxxx> · Fri, 04 Aug 2017 07:56:11 -0700




On August 4, 2017 2:59:56 AM Daniel Stone <daniel@xxxxxxxxxxxxx> wrote:

Hi Jason,

On 4 August 2017 at 01:52, Jason Ekstrand <jason@xxxxxxxxxxxxxx> wrote:
Previously, the test used the old 64x64 convention that Ville introduced
for CCS tiles and not the current 128x32 Y-tile convention.  Also, the
original scheme for generating the CCS data was over-complicated and
didn't work correctly because it assumed you could cut the main surface
at an arbitrary Y coordinate.  While you clearly *can* do this (the
hardware does), it's not a good idea for a generator in a test.  The new
scheme, introduced here, is entirely based on the relationship between
cache-lines in the main surface and the CCS that's documented in the
PRM.  By keeping everything CCS cache-line aligned, our chances of
generating correct data for an arbitrary-size surface are much higher.

Thanks for fixing this!

+                * We need to cut the surface on a CCS cache-line boundary,
+                * otherwise, we're going to be in trouble when we try to
+                * generate CCS data for the surface.  A cache line in the
+                * CCS is 16x16 cache-line-pairs in the main surface.  16
+                * cache lines is 64 rows high.
+                */
+               half_height = ALIGN(height, 128) / 2;
+               half_size = half_height * stride;
+               for (i = 0; i < fb->size / 4; i++) {
+                       if (i < half_size / 4)
+                               ptr[i] = RED;
+                       else
+                               ptr[i] = COMPRESSED_RED;

I toyed with:
else if (!(i & 0xe))
        ptr[i] = COMPRESSED_RED;
else
        ptr[i] = BLACK;

... to make the failure mode even more obvious. But that still writes
in (far) more compressed-red pixels than we strictly need for CCS,
right? Anyway, follow-up patch.

Yeah, we can probably do quite a bit of patterning so long as we keep the 
compressed/uncompressed split simple and make sure our pattern works in 
whole cache lines.

+static unsigned int
+y_tile_y_pos(unsigned int offset, unsigned int stride)
 {
-       return ptr +
-               ((y & ~0x3f) * stride) +
-               ((x & ~0x7) * 64) +
-               ((y & 0x3f) * 8) +
-               (x & 7);
+       unsigned int y_tiles, y;
+       y_tiles = (offset / 4096) / (stride / 128);
+       y = y_tiles * 32 + ((offset & 0x1f0) >> 4);
+       return y;
 }

@@ -143,12 +168,26 @@ static void display_fb(data_t *data, int compressed)
        size[0] = f.pitches[0] * ALIGN(height, 32);

        if (compressed) {
-               width = ALIGN(f.width, 16) / 16;
-               height = ALIGN(f.height, 8) / 8;
-               f.pitches[1] = ALIGN(width * 1, 64);
+               /* From the Sky Lake PRM, Vol 12, "Color Control Surface":
+                *
+                *    "The compression state of the cache-line pair is
+                *    specified by 2 bits in the CCS.  Each CCS cache-line
+                *    represents an area on the main surface of 16x16 sets
+                *    of 128 byte Y-tiled cache-line-pairs. CCS is always Y
+                *    tiled."
+                *
+                * A "cache-line-pair" for a Y-tiled surface is two
+                * horizontally adjacent cache lines.  When operating in
+                * bytes and rows, this gives us a scale-down factor of
+                * 32x16.  Since the main surface has a 32-bit format, we
+                * need to multiply width by 4 to get bytes.
+                */

Yeah, this code and comment match better (& helped clarify) my
understanding of how it works. But given my understanding was mostly
gleaned from other, borderline incomprehensible, comments, as well as
manually poking bits and observing the result, maybe that's not a
ringing endorsement.

+               width = ALIGN(f.width * 4, 32) / 32;
+               height = ALIGN(f.height, 16) / 16;
+               f.pitches[1] = ALIGN(width * 1, 128);
                f.modifier[1] = modifier;
                f.offsets[1] = size[0];
-               size[1] = f.pitches[1] * ALIGN(height, 64);
+               size[1] = f.pitches[1] * ALIGN(height, 32);

I changed this to f.height rather than height, because otherwise the
kernel was rejecting the aux buffer for being too small.

Congratulations, you found a bug in the kernel branch you're running.  The 
downsized height is definitely what we want and it works fine with my 
kernel branch.

With that, it now passes on SKL + APL for me, so I've pushed with my
review. Thanks!

Cheers,
Daniel


_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx