On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote: > Builds upon the existing NVIDIA 16Bx2 block linear > format modifiers by adding more "fields" to the > existing parameterized > DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier > macro that allow fully defining a unique-across- > all-NVIDIA-hardware bit layout using a minimal > set of fields and values. The new modifier macro > DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is > effectively backwards compatible with the existing > macro, introducing a superset of the previously > definable format modifiers. > > Backwards compatibility has two quirks. First, > the zero value for the "kind" field, which is > implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK > macro, must be special cased in drivers and > assumed to map to the pre-Turing generic kind of > 0xfe, since a kind of "zero" is reserved for > linear buffer layouts on all GPUs. > > Second, it is assumed backwards compatibility > is only needed when running on Tegra GPUs, and > specifically Tegra GPUs prior to Xavier. This > is based on two assertions: > > -Tegra GPUs prior to Xavier used a slightly > different raw bit layout than desktop GPUs, > making it impossible to directly share block > linear buffers between the two. > > -Support for the existing block linear modifiers > was incomplete, making them useful only for > exporting buffers created by nouveau and > importing them to Tegra DRM as framebuffers for > scan out. There was no support for adding > framebuffers using format modifiers in nouveau, > nor importing dma-buf/PRIME GEM objects into > nouveau userspace drivers with modifiers in Mesa. > > Hence it is assumed the prior modifiers were not > intended for use on desktop GPUs, and as a > corrolary, were not intended to support sharing > block linear buffers across two different NVIDIA > GPUs. > > Signed-off-by: James Jones <jajones@xxxxxxxxxx> > --- > include/uapi/drm/drm_fourcc.h | 108 +++++++++++++++++++++++++++++++--- > 1 file changed, 100 insertions(+), 8 deletions(-) > > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h > index 3feeaa3f987a..cc9853d42a24 100644 > --- a/include/uapi/drm/drm_fourcc.h > +++ b/include/uapi/drm/drm_fourcc.h > @@ -497,7 +497,99 @@ extern "C" { > #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1) > > /* > - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later > + * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80, > + * and Tegra GPUs starting with Tegra K1. > + * > + * Pixels are arranged in Groups of Bytes (GOBs). GOB size and layout varies > + * based on the architecture generation. GOBs themselves are then arranged in > + * 3D blocks, with the block dimensions (in terms of GOBs) always being a power > + * of two, and hence expressible as their log2 equivalent (E.g., "2" represents > + * a block depth or height of "4"). > + * > + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format > + * in full detail. > + * > + * Macro > + * Bits Param Description > + * ---- ----- ----------------------------------------------------------------- > + * > + * 3:0 h log2(height) of each block, in GOBs. Placed here for > + * compatibility with the existing > + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. > + * > + * 4:4 - Must be 1, to indicate block-linear layout. Necessary for > + * compatibility with the existing > + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. > + * > + * 8:5 - Reserved (To support 3D-surfaces with variable log2(depth) block > + * size). Must be zero. > + * > + * Note there is no log2(width) parameter. Some portions of the > + * hardware support a block width of two gobs, but it is impractical > + * to use due to lack of support elsewhere, and has no known > + * benefits. > + * > + * 11:9 - Reserved (To support 2D-array textures with variable array stride > + * in blocks, specified via log2(tile width in blocks)). Must be > + * zero. > + * > + * 19:12 k Page Kind. This value directly maps to a field in the page > + * tables of all GPUs >= NV50. It affects the exact layout of bits > + * in memory and can be derived from the tuple > + * > + * (format, GPU model, compression type, samples per pixel) > + * > + * Where compression type is defined below. If GPU model were > + * implied by the format modifier, format, or memory buffer, page > + * kind would not need to be included in the modifier itself, but > + * since the modifier should define the layout of the associated > + * memory buffer independent from any device or other context, it > + * must be included here. > + * > + * To grandfather in prior block linear format modifiers to this > + * layout, the page kind "0", which corresponds to "pitch/linear" > + * and hence is unusable with block-linear layouts, is remapped > + * within drivers to the value 0xfe, which corresponds to the > + * "generic" kind used for simple single-sample color formats on > + * pre-Turing GPUs. Hm, maybe a tiny static inline function which canonizalizes modifiers? Something like static inline u64 drm_fourcc_canonicalize_nvidia_block_linear_2d(u64 modifer, bool is_pre_turing) { } Would then give you a nice place to stick this backward compat note and make it really clear what should be done. I think establishing this as a pattern would also be nice, since I'm sure we'll have a pile more of these cases where modifiers turn out to assume a few too many things about the platform they're used on (we have a similar case on the intel side too). Just a drive-by idea, feel free to ignore. Cheers, Daniel > + * > + * 21:20 g GOB Height and Page Kind Generation. The height of a GOB changed > + * starting with Fermi GPUs. Additionally, the mapping between page > + * kind and bit layout has changed at various points. > + * > + * 0 = Gob Height 8, Fermi - Volta, Tegra K1+ Page Kind mapping > + * 1 = Gob Height 4, G80 - GT2XX Page Kind mapping > + * 2 = Gob Height 8, Turing+ Page Kind mapping > + * 3 = Reserved for future use. > + * > + * 22:22 s Sector layout. On Tegra GPUs prior to Xavier, there is a further > + * bit remapping step that occurs at an even lower level than the > + * page kind and block linear swizzles. This causes the layout of > + * surfaces mapped in those SOC's GPUs to be incompatible with the > + * equivalent mapping on other GPUs in the same system. > + * > + * 0 = Tegra K1 - Tegra Parker/TX2 Layout. > + * 1 = Desktop GPU and Tegra Xavier+ Layout > + * > + * 24:23 c Lossless Framebuffer Compression type. > + * > + * 0 = none > + * 1 = ROP/3D, actual compression implied by the Page Kind field > + * 2 = CDE horizontal > + * 3 = CDE vertical > + * > + * 55:25 - Reserved for future use. Must be zero. > + */ > +#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \ > + fourcc_mod_code(NVIDIA, (0x10 | \ > + ((h) & 0xf) | \ > + (((k) & 0xff) << 12) | \ > + (((g) & 0x3) << 20) | \ > + (((s) & 0x1) << 22) | \ > + (((c) & 0x3) << 23))) > + > +/* > + * 16Bx2 Block Linear layout, used by Tegra K1 and later > * > * Pixels are arranged in 64x8 Groups Of Bytes (GOBs). GOBs are then stacked > * vertically by a power of 2 (1 to 32 GOBs) to form a block. > @@ -518,20 +610,20 @@ extern "C" { > * in full detail. > */ > #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(v) \ > - fourcc_mod_code(NVIDIA, 0x10 | ((v) & 0xf)) > + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0, (v)) > > #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_ONE_GOB \ > - fourcc_mod_code(NVIDIA, 0x10) > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0) > #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_TWO_GOB \ > - fourcc_mod_code(NVIDIA, 0x11) > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1) > #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_FOUR_GOB \ > - fourcc_mod_code(NVIDIA, 0x12) > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2) > #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_EIGHT_GOB \ > - fourcc_mod_code(NVIDIA, 0x13) > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3) > #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_SIXTEEN_GOB \ > - fourcc_mod_code(NVIDIA, 0x14) > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4) > #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \ > - fourcc_mod_code(NVIDIA, 0x15) > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5) > > /* > * Some Broadcom modifiers take parameters, for example the number of > -- > 2.17.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Nouveau mailing list Nouveau@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/nouveau