On Tue, Oct 15, 2019 at 5:14 PM James Jones <jajones@xxxxxxxxxx> wrote: > > On 10/15/19 7:19 AM, Daniel Vetter wrote: > > On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote: > >> Builds upon the existing NVIDIA 16Bx2 block linear > >> format modifiers by adding more "fields" to the > >> existing parameterized > >> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier > >> macro that allow fully defining a unique-across- > >> all-NVIDIA-hardware bit layout using a minimal > >> set of fields and values. The new modifier macro > >> DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is > >> effectively backwards compatible with the existing > >> macro, introducing a superset of the previously > >> definable format modifiers. > >> > >> Backwards compatibility has two quirks. First, > >> the zero value for the "kind" field, which is > >> implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK > >> macro, must be special cased in drivers and > >> assumed to map to the pre-Turing generic kind of > >> 0xfe, since a kind of "zero" is reserved for > >> linear buffer layouts on all GPUs. > >> > >> Second, it is assumed backwards compatibility > >> is only needed when running on Tegra GPUs, and > >> specifically Tegra GPUs prior to Xavier. This > >> is based on two assertions: > >> > >> -Tegra GPUs prior to Xavier used a slightly > >> different raw bit layout than desktop GPUs, > >> making it impossible to directly share block > >> linear buffers between the two. > >> > >> -Support for the existing block linear modifiers > >> was incomplete, making them useful only for > >> exporting buffers created by nouveau and > >> importing them to Tegra DRM as framebuffers for > >> scan out. There was no support for adding > >> framebuffers using format modifiers in nouveau, > >> nor importing dma-buf/PRIME GEM objects into > >> nouveau userspace drivers with modifiers in Mesa. > >> > >> Hence it is assumed the prior modifiers were not > >> intended for use on desktop GPUs, and as a > >> corrolary, were not intended to support sharing > >> block linear buffers across two different NVIDIA > >> GPUs. > >> > >> Signed-off-by: James Jones <jajones@xxxxxxxxxx> > >> --- > >> include/uapi/drm/drm_fourcc.h | 108 +++++++++++++++++++++++++++++++--- > >> 1 file changed, 100 insertions(+), 8 deletions(-) > >> > >> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h > >> index 3feeaa3f987a..cc9853d42a24 100644 > >> --- a/include/uapi/drm/drm_fourcc.h > >> +++ b/include/uapi/drm/drm_fourcc.h > >> @@ -497,7 +497,99 @@ extern "C" { > >> #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1) > >> > >> /* > >> - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later > >> + * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80, > >> + * and Tegra GPUs starting with Tegra K1. > >> + * > >> + * Pixels are arranged in Groups of Bytes (GOBs). GOB size and layout varies > >> + * based on the architecture generation. GOBs themselves are then arranged in > >> + * 3D blocks, with the block dimensions (in terms of GOBs) always being a power > >> + * of two, and hence expressible as their log2 equivalent (E.g., "2" represents > >> + * a block depth or height of "4"). > >> + * > >> + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format > >> + * in full detail. > >> + * > >> + * Macro > >> + * Bits Param Description > >> + * ---- ----- ----------------------------------------------------------------- > >> + * > >> + * 3:0 h log2(height) of each block, in GOBs. Placed here for > >> + * compatibility with the existing > >> + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. > >> + * > >> + * 4:4 - Must be 1, to indicate block-linear layout. Necessary for > >> + * compatibility with the existing > >> + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. > >> + * > >> + * 8:5 - Reserved (To support 3D-surfaces with variable log2(depth) block > >> + * size). Must be zero. > >> + * > >> + * Note there is no log2(width) parameter. Some portions of the > >> + * hardware support a block width of two gobs, but it is impractical > >> + * to use due to lack of support elsewhere, and has no known > >> + * benefits. > >> + * > >> + * 11:9 - Reserved (To support 2D-array textures with variable array stride > >> + * in blocks, specified via log2(tile width in blocks)). Must be > >> + * zero. > >> + * > >> + * 19:12 k Page Kind. This value directly maps to a field in the page > >> + * tables of all GPUs >= NV50. It affects the exact layout of bits > >> + * in memory and can be derived from the tuple > >> + * > >> + * (format, GPU model, compression type, samples per pixel) > >> + * > >> + * Where compression type is defined below. If GPU model were > >> + * implied by the format modifier, format, or memory buffer, page > >> + * kind would not need to be included in the modifier itself, but > >> + * since the modifier should define the layout of the associated > >> + * memory buffer independent from any device or other context, it > >> + * must be included here. > >> + * > >> + * To grandfather in prior block linear format modifiers to this > >> + * layout, the page kind "0", which corresponds to "pitch/linear" > >> + * and hence is unusable with block-linear layouts, is remapped > >> + * within drivers to the value 0xfe, which corresponds to the > >> + * "generic" kind used for simple single-sample color formats on > >> + * pre-Turing GPUs. > > > > Hm, maybe a tiny static inline function which canonizalizes modifiers? > > Something like > > > > static inline u64 > > drm_fourcc_canonicalize_nvidia_block_linear_2d(u64 modifer, bool > > is_pre_turing) > > { > > } > > > > Would then give you a nice place to stick this backward compat note and > > make it really clear what should be done. I think establishing this as a > > pattern would also be nice, since I'm sure we'll have a pile more of these > > cases where modifiers turn out to assume a few too many things about the > > platform they're used on (we have a similar case on the intel side too). > > To make sure I'm clear, it would behave like this? > > fixed_mod = canonicalize(old_style_valid_mod, true); > assert(fixed_mod == old_style_valid_mod | (0xfe << 12)); > fixed_mod = canonicalize(new_style_valid_mod, [false,true]); > assert(fixed_mod == new_style_valid_mod); > > I'm unclear what it should do in this case though: > > fixed_mod = canonicalize(old_style_valid_mod, false); > > Since there's no code out there using the old style modifiers with > Turing+ yet, and I don't want to try to support such usage. Maybe just > drop the "is_pre_turing" parameter and always canonicalize by mapping 0 > -> 0xfe as the comment above states, and not touching other values? Or > should any invalid modifier, including this case, return > DRM_FORMAT_MOD_INVALID? This latter idea seems risky because it would > cause software compiled against old drm_fourcc.h to potentially reject > format modifiers from newer kernels or libraries with an expanded > representation, but maybe that's what we want if only driver components > are supposed to call this function. That's where I'm showing that I have no clue about nvidia buffer formats I guess :-) If 0 never makes sense, then yeah I guess you could just unconditionally canonicalize. Maybe you want to reject the old/legacy style on turing+ plus, but that's a tradeoff up to you guys - as long as it's consistent across all involved drivers. Not rejecting it would essentially make 0 an alias for 0xfe everywhere, post and pre-turing. More reasons for having a shared canonicalize function which enforces the same behaviour (whichever you pick) everywhere I think. And yeah the canonicalize function would only be called by drivers, not by clients. -Daniel > Thanks, > -James > > > Just a drive-by idea, feel free to ignore. > > > > Cheers, Daniel > > > >> + * > >> + * 21:20 g GOB Height and Page Kind Generation. The height of a GOB changed > >> + * starting with Fermi GPUs. Additionally, the mapping between page > >> + * kind and bit layout has changed at various points. > >> + * > >> + * 0 = Gob Height 8, Fermi - Volta, Tegra K1+ Page Kind mapping > >> + * 1 = Gob Height 4, G80 - GT2XX Page Kind mapping > >> + * 2 = Gob Height 8, Turing+ Page Kind mapping > >> + * 3 = Reserved for future use. > >> + * > >> + * 22:22 s Sector layout. On Tegra GPUs prior to Xavier, there is a further > >> + * bit remapping step that occurs at an even lower level than the > >> + * page kind and block linear swizzles. This causes the layout of > >> + * surfaces mapped in those SOC's GPUs to be incompatible with the > >> + * equivalent mapping on other GPUs in the same system. > >> + * > >> + * 0 = Tegra K1 - Tegra Parker/TX2 Layout. > >> + * 1 = Desktop GPU and Tegra Xavier+ Layout > >> + * > >> + * 24:23 c Lossless Framebuffer Compression type. > >> + * > >> + * 0 = none > >> + * 1 = ROP/3D, actual compression implied by the Page Kind field > >> + * 2 = CDE horizontal > >> + * 3 = CDE vertical > >> + * > >> + * 55:25 - Reserved for future use. Must be zero. > >> + */ > >> +#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \ > >> + fourcc_mod_code(NVIDIA, (0x10 | \ > >> + ((h) & 0xf) | \ > >> + (((k) & 0xff) << 12) | \ > >> + (((g) & 0x3) << 20) | \ > >> + (((s) & 0x1) << 22) | \ > >> + (((c) & 0x3) << 23))) > >> + > >> +/* > >> + * 16Bx2 Block Linear layout, used by Tegra K1 and later > >> * > >> * Pixels are arranged in 64x8 Groups Of Bytes (GOBs). GOBs are then stacked > >> * vertically by a power of 2 (1 to 32 GOBs) to form a block. > >> @@ -518,20 +610,20 @@ extern "C" { > >> * in full detail. > >> */ > >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(v) \ > >> - fourcc_mod_code(NVIDIA, 0x10 | ((v) & 0xf)) > >> + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0, (v)) > >> > >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_ONE_GOB \ > >> - fourcc_mod_code(NVIDIA, 0x10) > >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0) > >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_TWO_GOB \ > >> - fourcc_mod_code(NVIDIA, 0x11) > >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1) > >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_FOUR_GOB \ > >> - fourcc_mod_code(NVIDIA, 0x12) > >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2) > >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_EIGHT_GOB \ > >> - fourcc_mod_code(NVIDIA, 0x13) > >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3) > >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_SIXTEEN_GOB \ > >> - fourcc_mod_code(NVIDIA, 0x14) > >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4) > >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \ > >> - fourcc_mod_code(NVIDIA, 0x15) > >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5) > >> > >> /* > >> * Some Broadcom modifiers take parameters, for example the number of > >> -- > >> 2.17.1 > >> > > -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel