Re: [PATCH v2 9/9] drm/i915: Add render decompression support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 11, 2017 at 1:49 PM, Jason Ekstrand <jason@xxxxxxxxxxxxxx> wrote:
On Tue, Jan 10, 2017 at 9:04 AM, Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> wrote:
On Mon, Jan 09, 2017 at 11:20:57AM -0800, Jason Ekstrand wrote:
> On Thu, Jan 5, 2017 at 7:14 AM, <ville.syrjala@xxxxxxxxxxxxxxx> wrote:
>
> > From: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> >
> > SKL+ display engine can scan out certain kinds of compressed surfaces
> > produced by the render engine. This involved telling the display engine
> > the location of the color control surfae (CCS) which describes
> > which parts of the main surface are compressed and which are not. The
> > location of CCS is provided by userspace as just another plane with its
> > own offset.
> >
> > Add the required stuff to validate the user provided AUX plane metadata
> > and convert the user provided linear offset into something the hardware
> > can consume.
> >
> > Due to hardware limitations we require that the main surface and
> > the AUX surface (CCS) be part of the same bo. The hardware also
> > makes life hard by not allowing you to provide separate x/y offsets
> > for the main and AUX surfaces (excpet with NV12), so finding suitable
> > offsets for both requires a bit of work. Assuming we still want keep
> > playing tricks with the offsets. I've just gone with a dumb "search
> > backward for suitable offsets" approach, which is far from optimal,
> > but it works.
> >
> > Also not all planes will be capable of scanning out compressed surfaces,
> > and eg. 90/270 degree rotation is not supported in combination with
> > decompression either.
> >
> > This patch may contain work from at least the following people:
> > * Vandana Kannan <vandana.kannan@xxxxxxxxx>
> > * Daniel Vetter <daniel@xxxxxxxx>
> > * Ben Widawsky <ben@xxxxxxxxxxxx>
> >
> > v2: Deal with display workarounds 0390, 0531, 1125 (Paulo)
> >
> > Cc: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx>
> > Cc: Vandana Kannan <vandana.kannan@xxxxxxxxx>
> > Cc: Daniel Vetter <daniel@xxxxxxxx>
> > Cc: Ben Widawsky <ben@xxxxxxxxxxxx>
> > Cc: Jason Ekstrand <jason@xxxxxxxxxxxxxx>
> > Signed-off-by: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> > ---
> >  drivers/gpu/drm/i915/i915_reg.h      |  23 ++++
> >  drivers/gpu/drm/i915/intel_display.c | 234 ++++++++++++++++++++++++++++++
> > ++---
> >  drivers/gpu/drm/i915/intel_pm.c      |  29 ++++-
> >  drivers/gpu/drm/i915/intel_sprite.c  |   5 +
> >  4 files changed, 274 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_
> > reg.h
> > index 00970aa77afa..6849ba93f1d9 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -6209,6 +6209,28 @@ enum {
> >                         _ID(id, _PS_ECC_STAT_1A, _PS_ECC_STAT_2A),   \
> >                         _ID(id, _PS_ECC_STAT_1B, _PS_ECC_STAT_2B))
> >
> > +#define PLANE_AUX_DIST_1_A             0x701c0
> > +#define PLANE_AUX_DIST_2_A             0x702c0
> > +#define PLANE_AUX_DIST_1_B             0x711c0
> > +#define PLANE_AUX_DIST_2_B             0x712c0
> > +#define _PLANE_AUX_DIST_1(pipe) \
> > +                       _PIPE(pipe, PLANE_AUX_DIST_1_A, PLANE_AUX_DIST_1_B)
> > +#define _PLANE_AUX_DIST_2(pipe) \
> > +                       _PIPE(pipe, PLANE_AUX_DIST_2_A, PLANE_AUX_DIST_2_B)
> > +#define PLANE_AUX_DIST(pipe, plane)     \
> > +       _MMIO_PLANE(plane, _PLANE_AUX_DIST_1(pipe),
> > _PLANE_AUX_DIST_2(pipe))
> > +
> > +#define PLANE_AUX_OFFSET_1_A           0x701c4
> > +#define PLANE_AUX_OFFSET_2_A           0x702c4
> > +#define PLANE_AUX_OFFSET_1_B           0x711c4
> > +#define PLANE_AUX_OFFSET_2_B           0x712c4
> > +#define _PLANE_AUX_OFFSET_1(pipe)       \
> > +               _PIPE(pipe, PLANE_AUX_OFFSET_1_A, PLANE_AUX_OFFSET_1_B)
> > +#define _PLANE_AUX_OFFSET_2(pipe)       \
> > +               _PIPE(pipe, PLANE_AUX_OFFSET_2_A, PLANE_AUX_OFFSET_2_B)
> > +#define PLANE_AUX_OFFSET(pipe, plane)   \
> > +       _MMIO_PLANE(plane, _PLANE_AUX_OFFSET_1(pipe),
> > _PLANE_AUX_OFFSET_2(pipe))
> > +
> >  /* legacy palette */
> >  #define _LGC_PALETTE_A           0x4a000
> >  #define _LGC_PALETTE_B           0x4a800
> > @@ -6433,6 +6455,7 @@ enum {
> >  # define CHICKEN3_DGMG_DONE_FIX_DISABLE                (1 << 2)
> >
> >  #define CHICKEN_PAR1_1         _MMIO(0x42080)
> > +#define  SKL_RC_HASH_OUTSIDE   (1 << 15)
> >  #define  DPA_MASK_VBLANK_SRD   (1 << 15)
> >  #define  FORCE_ARB_IDLE_PLANES (1 << 14)
> >  #define  SKL_EDP_PSR_FIX_RDWRAP        (1 << 3)
> > diff --git a/drivers/gpu/drm/i915/intel_display.c
> > b/drivers/gpu/drm/i915/intel_display.c
> > index 38de9df0ec60..2236abebd8bc 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -2064,11 +2064,19 @@ intel_tile_width_bytes(const struct
> > drm_framebuffer *fb, int plane)
> >                         return 128;
> >                 else
> >                         return 512;
> > +       case I915_FORMAT_MOD_Y_TILED_CCS:
> > +               if (plane == 1)
> > +                       return 64;
> > +               /* fall through */
> >         case I915_FORMAT_MOD_Y_TILED:
> >                 if (IS_GEN2(dev_priv) || HAS_128_BYTE_Y_TILING(dev_priv))
> >                         return 128;
> >                 else
> >                         return 512;
> > +       case I915_FORMAT_MOD_Yf_TILED_CCS:
> > +               if (plane == 1)
> > +                       return 64;
> >
>
> I still think a CCS tile is 128B wide. :-)

The spec kinda suggests the same. But I still couldn't figure out where
that notion really came from, so I just went with the value that gave
me the expected result on my screen. That is writing 64 bytes into the
tile is exactly what's required to fill a single row/column, writing
more would wrap.

Yeah.  Ultimately it doesn't matter.  It's an arbitrary number userspace multiplies by and the kernel divides by.  We just need to settle on something sensible.  In userspace, we've more-or-less settled on 128 to match the docs (and what the hardware does for W-tiling and some other edge cases) but it's not a huge deal.
 
>
>
> > +               /* fall through */
> >         case I915_FORMAT_MOD_Yf_TILED:
> >                 /*
> >                  * Bspec seems to suggest that the Yf tile width would
> > @@ -2156,7 +2164,7 @@ static unsigned int intel_surf_alignment(const
> > struct drm_framebuffer *fb,
> >         struct drm_i915_private *dev_priv = to_i915(fb->dev);
> >
> >         /* AUX_DIST needs only 4K alignment */
> > -       if (fb->format->format == DRM_FORMAT_NV12 && plane == 1)
> > +       if (plane == 1)
> >                 return 4096;
> >
> >         switch (fb->modifier) {
> > @@ -2166,6 +2174,8 @@ static unsigned int intel_surf_alignment(const
> > struct drm_framebuffer *fb,
> >                 if (INTEL_GEN(dev_priv) >= 9)
> >                         return 256 * 1024;
> >                 return 0;
> > +       case I915_FORMAT_MOD_Y_TILED_CCS:
> > +       case I915_FORMAT_MOD_Yf_TILED_CCS:
> >         case I915_FORMAT_MOD_Y_TILED:
> >         case I915_FORMAT_MOD_Yf_TILED:
> >                 return 1 * 1024 * 1024;
> > @@ -2472,6 +2482,7 @@ static unsigned int intel_fb_modifier_to_tiling(uint64_t
> > fb_modifier)
> >         case I915_FORMAT_MOD_X_TILED:
> >                 return I915_TILING_X;
> >         case I915_FORMAT_MOD_Y_TILED:
> > +       case I915_FORMAT_MOD_Y_TILED_CCS:
> >                 return I915_TILING_Y;
> >         default:
> >                 return I915_TILING_NONE;
> > @@ -2536,6 +2547,35 @@ intel_fill_fb_info(struct drm_i915_private
> > *dev_priv,
> >
> >                 intel_fb_offset_to_xy(&x, &y, fb, i);
> >
> > +               if ((fb->modifier == I915_FORMAT_MOD_Y_TILED_CCS ||
> > +                    fb->modifier == I915_FORMAT_MOD_Yf_TILED_CCS) && i ==
> > 1) {
> > +                       int main_x, main_y;
> > +                       int ccs_x, ccs_y;
> > +
> > +                       /*
> > +                        * Each byte of CCS corresponds to a 16x8 area of
> > the main surface, and
> > +                        * each CCS tile is 64x64 bytes.
> > +                        */
> > +                       ccs_x = (x * 16) % (64 * 16);
> > +                       ccs_y = (y * 8) % (64 * 8);
> >
>
> This makes me nervous.  Why are we multiplying CCS coordinates by something
> before we do the modulus?  Why aren't coordinates in both surfaces in
> pixels?

For converting the linear offset (which is in bytes) into x/y we just
consider each pixel to be 1 byte. Hence to get the corresponding pixel
coordinates we multiply the byte based coordinates by 16x8. We can't
really deal with <1 byte pixels in most places.

So are the x/y offsets provided by userspace in bytes or pixels for regular color surfaces?  I had kind-of assumed pixels, but I guess I could also see bytes.
 
> So long as you keep things in pixesl and know that a CCS tile is
> 1024x512px and a color tile is 32x32 pixels, you can safely do tile
> offsetting and it all makes sense.  Having different units looks like a
> recipe for some very confusing bugs.  Am I just completely misunderstanding
> what's going on here?

Doing things in pixels would involve totally custom code for the CCS.
By thinking of CCS as having 1 byte pixels we can share the code already
used for everything else (apart from this one special check which is
really only necessary because the HW ignores the AUX x/y offsets for CCS.

I suppose it would be possible to rewrite a bunch of other things to
allow <1 byte pixels but I couldn't be bothered to go there.

I think I need a better mental model of what the X/Y offset API looks like before I can reply to that properly.

I had a very useful chat with Kristian on IRC today and he explained that the offset in drm_mode_fb_cmd2 is just to get you to the upper-left pixel of the image and the actual X/Y offset used for scanout is provided in pixels when you do SetCrtc.  This seems completely sane and, at that point, I really don't care how you do things internally so long as the results are correct.  Consider my comments here dropped.  I didn't realize I was arguing over an implementation detail.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux