On Tue, Sep 4, 2018 at 12:04 PM Daniel Vetter <daniel.vetter at ffwll.ch> wrote: > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen > <bas at basnieuwenhuizen.nl> wrote: > > This is an initial proposal for format modifiers for AMD hardware. > > > > It uses 48 bits including a chip generation, leaving 8 bits for > > a format version number. > > > > On gfx6-gfx8 we have all the fields influencing sample locations > > in memory. > > > > Tile split bytes are optional for single sample buffers as no > > hardware reaches the split size with 1 sample and hence the actual > > size does not matter. > > > > The macrotile fields are duplicated for images with multiple planes. > > If the planes have different bitdepth they need different macro > > tile fields and different tile split bytes if multisample. > > > > I could not fit multiple copies in for tile split bytes, but > > multisample & multiplane images are very rare. Overall, I think > > we should punt on multisample for a later format version since > > they are generally not shared on any modifier aware windowing > > system, and we have more issues like fmask & cmask. > > > > We need these copies because the drm modifier of all planes in an > > image needs to be equal, so we need to fit these together. > > > > This adds fields for compression support, using metadata that is > > compatible with AMDVLK and for which radv and radeonsi can > > reasonably be extended. > > > > The big open question for compression is between which generations > > the format changed to see if we can share more. > > > > This explicitly does not try to solve the linear stride alignment > > issue, thoguh we could internally just use the tiling modes for > > the linear modes to indicate linear images with the stride for the > > given chip. > > > > Signed-off-by: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl> > > CC: Chad Versace <chadversary at chromium.org> > > CC: Dave Airlie <airlied at redhat.com> > > CC: Marek Olšák <marek.olsak at amd.com> > > CC: Nicolai Hähnle <nicolai.haehnle at amd.com> > > CC: Alex Deucher <alexander.deucher at amd.com> > > CC: Daniel Vetter <daniel.vetter at ffwll.ch> > > --- > > include/uapi/drm/amdgpu_drm.h | 130 ++++++++++++++++++++++++++++++++++ > > 1 file changed, 130 insertions(+) > > > > diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h > > index 94444eeba55b..4e1452161dbf 100644 > > --- a/include/uapi/drm/amdgpu_drm.h > > +++ b/include/uapi/drm/amdgpu_drm.h > > @@ -990,6 +990,136 @@ struct drm_amdgpu_info_vce_clock_table { > > #define AMDGPU_FAMILY_AI 141 /* Vega10 */ > > #define AMDGPU_FAMILY_RV 142 /* Raven */ > > > > +#define AMDGPU_CHIP_TAHITI 0 > > +#define AMDGPU_CHIP_PITCAIRN 1 > > +#define AMDGPU_CHIP_VERDE 2 > > +#define AMDGPU_CHIP_OLAND 3 > > +#define AMDGPU_CHIP_HAINAN 4 > > +#define AMDGPU_CHIP_BONAIRE 5 > > +#define AMDGPU_CHIP_KAVERI 6 > > +#define AMDGPU_CHIP_KABINI 7 > > +#define AMDGPU_CHIP_HAWAII 8 > > +#define AMDGPU_CHIP_MULLINS 9 > > +#define AMDGPU_CHIP_TOPAZ 10 > > +#define AMDGPU_CHIP_TONGA 11 > > +#define AMDGPU_CHIP_FIJI 12 > > +#define AMDGPU_CHIP_CARRIZO 13 > > +#define AMDGPU_CHIP_STONEY 14 > > +#define AMDGPU_CHIP_POLARIS10 15 > > +#define AMDGPU_CHIP_POLARIS11 16 > > +#define AMDGPU_CHIP_POLARIS12 17 > > +#define AMDGPU_CHIP_VEGAM 18 > > +#define AMDGPU_CHIP_VEGA10 19 > > +#define AMDGPU_CHIP_VEGA12 20 > > +#define AMDGPU_CHIP_VEGA20 21 > > +#define AMDGPU_CHIP_RAVEN 22 > > + > > +/* Format for GFX6-GFX8 DRM format modifiers. These are intentionally the same > > + * as AMDGPU_TILING_*. However, the the rules as to when to set them are > > + * different. > > + * > > + * Do not use linear ARRAY_MODEs or SWIZZLE_MODEs. Use DRM_FORMAT_MOD_LINEAR > > + * instead. > > + * > > + * If ARRAY_MODE is 1D, only the micro tile mode and the pipe config should be > > + * set. > > + * > > + * For other ARRAY_MODEs: > > + * - Only set TILE_SPLIT if the image is multisample. > > + * > > + * We have 1 extra bit for the micro tile mode, as GFX6 and GFX7+ have 1 > > + * different value there. The values are > > + * - depth : 0 > > + * - displayable : 1 > > + * - thin : 2 > > + * - thick (GFX6) : 3 > > + * - rotated (GFX7+) : 4 > > + * > > + * TODO: What to do with multisample multi plane images? More tile split > > + * fields don't fit if we want to keep a few bits for a format version. > > + */ > > +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_SHIFT 0 > > +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_MASK 0xf > > +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_SHIFT 4 > > +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_MASK 0x1f > > +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_SHIFT 9 > > +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_MASK 0x7 > > +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_SHIFT 12 > > +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_MASK 0x7 > > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_SHIFT 15 > > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_SHIFT 17 > > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_SHIFT 19 > > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_SHIFT 21 > > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_MASK 0x3 > > + > > +/* Macrotile parameters for a second plane if existing */ > > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_SHIFT 23 > > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_SHIFT 25 > > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_SHIFT 27 > > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_SHIFT 29 > > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_MASK 0x3 > > + > > +/* Macrotile parameters for a third plane if existing */ > > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_SHIFT 31 > > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_SHIFT 33 > > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_SHIFT 35 > > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_SHIFT 37 > > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_MASK 0x3 > > + > > +#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_SHIFT 0 > > +#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_MASK 0x1f > > + > > +/* Whether to enable DCC compression. > > + * > > + * If enabled, exporting the surface results in three > > + * planes: > > + * - color data > > + * - DCC data > > + * - a 64-byte block with > > + * - a 16 byte 0/1 bool as to whether the surface is currently DCC compressed. > > + * - a 16-byte 0/1 bool as to whether the surface has fastclear data > > + * - a 8-byte chunk with the current fastclear colors > > + * > > + * To ensure we do not keep compressing and decompressing the surface, once it > > + * has been decompressed no party may recompress again. > > + * > > + * Applications should not hand over images with fastclear data as not > > + * all users can support it, however, to help both Vulkan implementations > > + * with the allocation we keep it in the 64-byte block. > > + * > > + * TODO: Can scanout really not support fastclear data? > > + * TODO: What to do with multiplane images? > > + */ > > +#define AMDGPU_MODIFIER_COMPRESSION_SHIFT 39 > > +#define AMDGPU_MODIFIER_COMPRESSION_MASK 0x1 > > Summary of the irc discussion with my 2 recommendations: > > - Make it clear in the spec here that all consumers must follow the 16 > byte boolean to control actual DCC usage (using a conditional register > write in the command buffer). On intel we've put that into the > modifier itself, along the lines of "I don't like compressed anymore, > pls give me something else". Not that runtime renogiation works well > (or well, at all). > > - If you have consumers which can't handle fastclear, make it a > separate modifier. > > > + > > +/* The chip this is compatible with. > > + * > > + * If compression is disabled, use > > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > > + * - AMDGPU_CHIP_VEGA10 for GFX9+ > > + * > > + * With compression enabled please use the exact chip. > > + * > > + * TODO: Do some generations share DCC format? > > + */ > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > > Do you really need all the combinations here of DCC + gpu gen + tiling > details? When we had the entire discussion with nvidia folks they > eventually agreed that they don't need the massive pile with every > possible combination. Do you really plan to share all these different > things? So GPU gen is based on the fact that GFX6-GFX8 and GFX9 had a significant change in tiling modes that are not compatible. Furthermore, DCC formats have seen changes between chips in the same generation and without more documentation on when those are compatible I'd like to be conservative and for DCC put every chip in its own class. For whether to enable DCC: there are basically a bunch of conditions in image creation parameters on when we can use it, so I'd like to keep non-DCC versions available even for HW that supports DCC. Tling on GFX9+ is only 5 bits so that is easy. For GFX6-GFX8 it is complicated. Each chip has a configurable system-wide 32-entry table of tiling modes (GFX7-GFX8 have the components split in two tables, but in practice given a format there is only one choice for the second table). The kernel has different settings for each chip to put in there. So if we do not want to enable sharing between GPUs we can just do a precise chip + tiling mode index and be done with it. However, since the entries are different between chips we have to either: 1) make some kind of mapping from (chip, tiling index) to a canonical value based on compatiblity. Which means for backwards compatibility reasons we're never going to be able to change the table config in the kernel again. 2) include the value of the table entry in the modifier. I chose solution 2 here. This means that while there is a lot of data in here in practice the tiling mode only has ~32 options per chip per format/bitdepth. There are some blocks not bound by this table (mostly scanout), but in practice the GL/vulkan driver just allocates from the tiling table anyway so kernel driver is just going to have to expose the modifiers corresponding to the tiling table to keep the amount limited. So as far as fields go, we have ARRAY_MODE = is this linear, micro-tiling, macro-tiling or thick-macro tiling (for 3d textures, various thicknesses), sparse texture. linear is not relevant here. micro-tiling is relevant for cross-GPU sharing if macro-tiling is not compatible. macro-tiling is optimal for 2d, sparse-texture could probably be aliased/out of scope. MICRO_TILE = normal, display, depth, rotated display I think these are obviously worth keeping PIPE_CONFIG = pretty much something constant over a chip but important for everything that is not linear TILE_SPLIT = relevant for multisample images. I think in practice constant for pretty much everything except GFX6, where it is format specific BANK_WIDTH,BANK_HEIGHT,MACRO_TILE_ASPEC,NUM_BANKS = for macro tiles, pretty much specific to chip + format, mostly independent of everything else. because of the bitdepth dependence we have to have multiple for multiplane images as some are e.g. one R plane + one GB plane. > > Note that e.g. on i915 we spec some of the tiling depending upon > buffer size and buffer format (because that's how the hw works), not > using explicit modifier flags for everything. > -Daniel > > > + > > +#define AMDGPU_MODIFIER_SET(field, value) \ > > + (((__u64)(value) & AMDGPU_MODIFIER_##field##_MASK) << AMDGPU_MODIFIER_##field##_SHIFT) > > +#define AMDGPU_MODIFIER_GET(value, field) \ > > + (((__u64)(value) >> AMDGPU_MODIFIER_##field##_SHIFT) & AMDGPU_MODIFIER_##field##_MASK) > > + > > /* > > * Definition of free sync enter and exit signals > > * We may have more options in the future > > -- > > 2.18.0 > > > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch