[RFC] drm/amdgpu: Add macros and documentation for format modifiers.

bas@xxxxxxxxxxxxxxxxxxx (Bas Nieuwenhuizen) · Tue, 4 Sep 2018 14:21:34 +0200

On Tue, Sep 4, 2018 at 12:04 PM Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
>
> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen
> <bas at basnieuwenhuizen.nl> wrote:
> > This is an initial proposal for format modifiers for AMD hardware.
> >
> > It uses 48 bits including a chip generation, leaving 8 bits for
> > a format version number.
> >
> > On gfx6-gfx8 we have all the fields influencing sample locations
> > in memory.
> >
> > Tile split bytes are optional for single sample buffers as no
> > hardware reaches the split size with 1 sample and hence the actual
> > size does not matter.
> >
> > The macrotile fields are duplicated for images with multiple planes.
> > If the planes have different bitdepth they need different macro
> > tile fields and different tile split bytes if multisample.
> >
> > I could not fit multiple copies in for tile split bytes, but
> > multisample & multiplane images are very rare. Overall, I think
> > we should punt on multisample for a later format version since
> > they are generally not shared on any modifier aware windowing
> > system, and we have more issues like fmask & cmask.
> >
> > We need these copies because the drm modifier of all planes in an
> > image needs to be equal, so we need to fit these together.
> >
> > This adds fields for compression support, using metadata that is
> > compatible with AMDVLK and for which radv and radeonsi can
> > reasonably be extended.
> >
> > The big open question for compression is between which generations
> > the format changed to see if we can share more.
> >
> > This explicitly does not try to solve the linear stride alignment
> > issue, thoguh we could internally just use the tiling modes for
> > the linear modes to indicate linear images with the stride for the
> > given chip.
> >
> > Signed-off-by: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
> > CC: Chad Versace <chadversary at chromium.org>
> > CC: Dave Airlie <airlied at redhat.com>
> > CC: Marek OlÅ¡Ã¡k <marek.olsak at amd.com>
> > CC: Nicolai HÃ¤hnle <nicolai.haehnle at amd.com>
> > CC: Alex Deucher <alexander.deucher at amd.com>
> > CC: Daniel Vetter <daniel.vetter at ffwll.ch>
> > ---
> >  include/uapi/drm/amdgpu_drm.h | 130 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 130 insertions(+)
> >
> > diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> > index 94444eeba55b..4e1452161dbf 100644
> > --- a/include/uapi/drm/amdgpu_drm.h
> > +++ b/include/uapi/drm/amdgpu_drm.h
> > @@ -990,6 +990,136 @@ struct drm_amdgpu_info_vce_clock_table {
> >  #define AMDGPU_FAMILY_AI                       141 /* Vega10 */
> >  #define AMDGPU_FAMILY_RV                       142 /* Raven */
> >
> > +#define AMDGPU_CHIP_TAHITI     0
> > +#define AMDGPU_CHIP_PITCAIRN   1
> > +#define AMDGPU_CHIP_VERDE      2
> > +#define AMDGPU_CHIP_OLAND      3
> > +#define AMDGPU_CHIP_HAINAN     4
> > +#define AMDGPU_CHIP_BONAIRE    5
> > +#define AMDGPU_CHIP_KAVERI     6
> > +#define AMDGPU_CHIP_KABINI     7
> > +#define AMDGPU_CHIP_HAWAII     8
> > +#define AMDGPU_CHIP_MULLINS    9
> > +#define AMDGPU_CHIP_TOPAZ      10
> > +#define AMDGPU_CHIP_TONGA      11
> > +#define AMDGPU_CHIP_FIJI       12
> > +#define AMDGPU_CHIP_CARRIZO    13
> > +#define AMDGPU_CHIP_STONEY     14
> > +#define AMDGPU_CHIP_POLARIS10  15
> > +#define AMDGPU_CHIP_POLARIS11  16
> > +#define AMDGPU_CHIP_POLARIS12  17
> > +#define AMDGPU_CHIP_VEGAM      18
> > +#define AMDGPU_CHIP_VEGA10     19
> > +#define AMDGPU_CHIP_VEGA12     20
> > +#define AMDGPU_CHIP_VEGA20     21
> > +#define AMDGPU_CHIP_RAVEN      22
> > +
> > +/* Format for GFX6-GFX8 DRM format modifiers. These are intentionally the same
> > + * as AMDGPU_TILING_*. However, the the rules as to when to set them are
> > + * different.
> > + *
> > + * Do not use linear ARRAY_MODEs or SWIZZLE_MODEs. Use DRM_FORMAT_MOD_LINEAR
> > + * instead.
> > + *
> > + * If ARRAY_MODE is 1D, only the micro tile mode and the pipe config should be
> > + * set.
> > + *
> > + * For other ARRAY_MODEs:
> > + *  - Only set TILE_SPLIT if the image is multisample.
> > + *
> > + * We have 1 extra bit for the micro tile mode, as GFX6 and GFX7+ have 1
> > + * different value there. The values are
> > + *   - depth           : 0
> > + *   - displayable     : 1
> > + *   - thin            : 2
> > + *   - thick (GFX6)    : 3
> > + *   - rotated (GFX7+) : 4
> > + *
> > + * TODO: What to do with multisample multi plane images? More tile split
> > + * fields don't fit if we want to keep a few bits for a format version.
> > + */
> > +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_SHIFT          0
> > +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_MASK           0xf
> > +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_SHIFT         4
> > +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_MASK          0x1f
> > +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_SHIFT          9
> > +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_MASK           0x7
> > +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_SHIFT     12
> > +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_MASK      0x7
> > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_SHIFT          15
> > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_MASK           0x3
> > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_SHIFT         17
> > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_MASK          0x3
> > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_SHIFT   19
> > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_MASK    0x3
> > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_SHIFT           21
> > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_MASK            0x3
> > +
> > +/* Macrotile parameters for a second plane if existing */
> > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_SHIFT                23
> > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_MASK         0x3
> > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_SHIFT       25
> > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_MASK                0x3
> > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_SHIFT 27
> > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_MASK  0x3
> > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_SHIFT         29
> > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_MASK          0x3
> > +
> > +/* Macrotile parameters for a third plane if existing */
> > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_SHIFT                31
> > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_MASK         0x3
> > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_SHIFT       33
> > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_MASK                0x3
> > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_SHIFT 35
> > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_MASK  0x3
> > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_SHIFT         37
> > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_MASK          0x3
> > +
> > +#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_SHIFT                0
> > +#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_MASK         0x1f
> > +
> > +/* Whether to enable DCC compression.
> > + *
> > + * If enabled, exporting the surface results in three
> > + * planes:
> > + *   - color data
> > + *   - DCC data
> > + *   - a 64-byte block with
> > + *     - a 16 byte 0/1 bool as to whether the surface is currently DCC compressed.
> > + *     - a 16-byte 0/1 bool as to whether the surface has fastclear data
> > + *     - a 8-byte chunk with the current fastclear colors
> > + *
> > + * To ensure we do not keep compressing and decompressing the surface, once it
> > + * has been decompressed no party may recompress again.
> > + *
> > + * Applications should not hand over images with fastclear data as not
> > + * all users can support it, however, to help both Vulkan implementations
> > + * with the allocation we keep it in the 64-byte block.
> > + *
> > + * TODO: Can scanout really not support fastclear data?
> > + * TODO: What to do with multiplane images?
> > + */
> > +#define AMDGPU_MODIFIER_COMPRESSION_SHIFT              39
> > +#define AMDGPU_MODIFIER_COMPRESSION_MASK               0x1
>
> Summary of the irc discussion with my 2 recommendations:
>
> - Make it clear in the spec here that all consumers must follow the 16
> byte boolean to control actual DCC usage (using a conditional register
> write in the command buffer). On intel we've put that into the
> modifier itself, along the lines of "I don't like compressed anymore,
> pls give me something else". Not that runtime renogiation works well
> (or well, at all).
>
> - If you have consumers which can't handle fastclear, make it a
> separate modifier.
>
> > +
> > +/* The chip this is compatible with.
> > + *
> > + * If compression is disabled, use
> > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> > + *
> > + * With compression enabled please use the exact chip.
> > + *
> > + * TODO: Do some generations share DCC format?
> > + */
> > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
> > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
>
> Do you really need all the combinations here of DCC + gpu gen + tiling
> details? When we had the entire discussion with nvidia folks they
> eventually agreed that they don't need the massive pile with every
> possible combination. Do you really plan to share all these different
> things?

So GPU gen is based on the fact that GFX6-GFX8 and GFX9 had a
significant change in tiling modes that are not compatible.
Furthermore, DCC formats have seen changes between chips in the same
generation and without more documentation on when those are compatible
I'd like to be conservative and for DCC put every chip in its own
class.

For whether to enable DCC: there are basically a bunch of conditions
in image creation parameters on when we can use it, so I'd like to
keep non-DCC versions available even for HW that supports DCC.

Tling on GFX9+ is only 5 bits so that is easy. For GFX6-GFX8 it is complicated.

Each chip has a configurable system-wide 32-entry table of tiling
modes (GFX7-GFX8 have the components split in two tables, but in
practice given a format there is only one choice for the second
table). The kernel has different settings for each chip to put in
there.

So if we do not want to enable sharing between GPUs we can just do a
precise chip + tiling mode index and be done with it. However, since
the entries are different between chips we have to either:
 1) make some kind of mapping from (chip, tiling index) to a canonical
value based on compatiblity. Which means for backwards compatibility
reasons we're never going to be able to change the table config in the
kernel again.
 2) include the value of the table entry in the modifier.

I chose solution 2 here. This means that while there is a lot of data
in here in practice the tiling mode only has ~32 options per chip per
format/bitdepth.

There are some blocks not bound by this table (mostly scanout), but in
practice the GL/vulkan driver just allocates from the tiling table
anyway so kernel driver is just going to have to expose the modifiers
corresponding to the tiling table to keep the amount limited.

So as far as fields go, we have
ARRAY_MODE = is this linear, micro-tiling, macro-tiling or thick-macro
tiling (for 3d textures, various thicknesses), sparse texture.
  linear is not relevant here. micro-tiling is relevant for cross-GPU
sharing if macro-tiling is not compatible. macro-tiling is optimal for
2d, sparse-texture could probably be aliased/out of scope.

MICRO_TILE = normal, display, depth, rotated display
  I think these are obviously worth keeping

PIPE_CONFIG = pretty much something constant over a chip but important
for everything that is not linear
TILE_SPLIT = relevant for multisample images. I think in practice
constant for pretty much everything except GFX6, where it is format
specific
BANK_WIDTH,BANK_HEIGHT,MACRO_TILE_ASPEC,NUM_BANKS = for macro tiles,
pretty much specific to chip + format, mostly independent of
everything else.
   because of the bitdepth dependence we have to have multiple for
multiplane images as some are e.g. one R plane + one GB plane.

>
> Note that e.g. on i915 we spec some of the tiling depending upon
> buffer size and buffer format (because that's how the hw works), not
> using explicit modifier flags for everything.
> -Daniel
>
> > +
> > +#define AMDGPU_MODIFIER_SET(field, value) \
> > +       (((__u64)(value) & AMDGPU_MODIFIER_##field##_MASK) << AMDGPU_MODIFIER_##field##_SHIFT)
> > +#define AMDGPU_MODIFIER_GET(value, field) \
> > +       (((__u64)(value) >> AMDGPU_MODIFIER_##field##_SHIFT) & AMDGPU_MODIFIER_##field##_MASK)
> > +
> >  /*
> >   * Definition of free sync enter and exit signals
> >   * We may have more options in the future
> > --
> > 2.18.0
> >
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch