Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

Daniel Vetter <daniel@xxxxxxxx> · Tue, 4 Sep 2018 16:43:39 +0200

On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
<bas@xxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@xxxxxxxx> wrote:
>>
>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
>> > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@xxxxxxxx> wrote:
>> > >
>> > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>> > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>> > > > > Hi,
>> > > > >
>> > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:
>> > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx> wrote:
>> > > > > > > +/* The chip this is compatible with.
>> > > > > > > + *
>> > > > > > > + * If compression is disabled, use
>> > > > > > > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
>> > > > > > > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
>> > > > > > > + *
>> > > > > > > + * With compression enabled please use the exact chip.
>> > > > > > > + *
>> > > > > > > + * TODO: Do some generations share DCC format?
>> > > > > > > + */
>> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
>> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
>> > > > > > Do you really need all the combinations here of DCC + gpu gen + tiling
>> > > > > > details? When we had the entire discussion with nvidia folks they
>> > > > > > eventually agreed that they don't need the massive pile with every
>> > > > > > possible combination. Do you really plan to share all these different
>> > > > > > things?
>> > > > > >
>> > > > > > Note that e.g. on i915 we spec some of the tiling depending upon
>> > > > > > buffer size and buffer format (because that's how the hw works), not
>> > > > > > using explicit modifier flags for everything.
>> > > > > Right. The conclusion, after people went through and started sorting
>> > > > > out the kinds of formats for which they would _actually_ export real
>> > > > > colour buffers for, that most vendors definitely have fewer than
>> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>> > > > > possible formats to represent, very likely fewer than
>> > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
>> > > > > fewer than 72,057,594,037,927,936 formats, and even still generally
>> > > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8
>> > > > > bits of the 56 available.
>> > > >
>> > > > The problem here is that at least for some parameters we actually don't know
>> > > > which formats are actually used.
>> > > >
>> > > > The following are not real world examples, but just to explain the general
>> > > > problem.
>> > > >
>> > > > The memory configuration for example can be not ASIC specific, but rather
>> > > > determined by whoever took the ASIC and glued it together with VRAM on a
>> > > > board. It is not likely that somebody puts all the VRAM chips on one
>> > > > channel, but it is still perfectly possible.
>> > > >
>> > > > Same is true for things like harvesting, e.g. of 16 channels halve of them
>> > > > could be bad and we need to know which to actually use.
>> > >
>> > > For my understanding: This leaks outside the chip when sharing buffers?
>> > > All the information you only need locally to a given amdgpu instance
>> > > don't really need to be encoded in modifiers.
>> > >
>> > > Pointers to code where this is all decided (kernel and radeonsi would be
>> > > good starters I guess) would be really good here.
>> >
>> > I extracted the information on which bits are relevant mostly from the
>> > AddrFromCoord functions in addrlib in mesa:
>> >
>> > for macro-tiles:
>> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
>> >
>> > for micro-tiles (or the micro-tiles in macro-tiles):
>> >
>> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
>>
>> So this is the decoding thing. How many of these actually exist, even when
>> taking all the other information into account?
>>
>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
>> + height + width, how much of all these bits do you actually still freely
>> pick?
>
> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
> depth, display-rotated) + whether to use compression, everything else
> is fixed given those option, the properties of the chip and the
> format.
>
>
>>
>> It might be that all the things you need to know from the memory config
>> don't encode smaller than the macro/micro/whatever else stuff. But that's
>> kinda the angle that we looked at this for everyone else.
>>
>> E.g. for multi-plane stuff, if everyone picks the same config for the
>> 2nd/3rd plane, then you don't actually need to encode that. It just
>> becomes part of the implied stuff in the modifier.
>
> The problem is some GPUs are compatible for say 8-bpp images, but not
> for 32-bpp surfaces. e.g. lets look at the following table showing the
> current configuration for all GFX6-GFX8 GPU:
>
> format: (bank width, bank height, macro tile aspect, num banks) for
> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG
>
> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2
> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16)
> ADDR_SURF_P8_32x32_8x16
> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16
> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16
> CIK APUs: (1,4,4, 8), (1,2,4,8),  (1, 2, 2, 8) ADDR_SURF_P2
> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2
> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16
> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16
> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2
>
> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp
> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only
> compatible for 8-bpp.
>
> So we can't just assume that if the first plane properties match that
> they do so for the second plane, because we don't know what GPU it is
> coming from.
>
> And we can make a canonical table from this but then if we change the
> above tables in the kernel that runs into compatibility issues.

I think that's roughly what nvidia ended up doing (not sure they
published anything, haven't seen any patches on dri-devel). Two parts:
- Bunch of flags/bits for the major mode, like {display, non-display,
DCC} and so on.
- For each of the major modes a list of enumerated modes actually used
in reality. For DCC this would be the generation, for more plain
formats it would be the above table. Table entries would be per-bpp
(in case that matters, or whatever matters for amd gpus).

Imo would be totally ok to put the relevant lookup tables into
drm_fourcc.h directly. Or any other suitable place.

>From a quick look about 16 bits total. Gives you 40bits of second
chances and "oops totally sorry".
-Daniel

>
>> -Daniel
>>
>> >
>> > > -Daniel
>> > >
>> > > >
>> > > > Regards,
>> > > > Christian.
>> > > >
>> > > > >
>> > > > > If you do use 256 bits in order to represent
>> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>> > > > > modifiers per format, userspace would start hitting OOM pretty quickly
>> > > > > as it attempted to enumerate and negotiate acceptable modifiers.
>> > > > > Either that or we need to replace the fixed 64-bit modifier tokens
>> > > > > with some kind of eBPF script.
>> > > > >
>> > > > > Cheers,
>> > > > > Daniel
>> > > > > _______________________________________________
>> > > > > dri-devel mailing list
>> > > > > dri-devel@xxxxxxxxxxxxxxxxxxxxx
>> > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>> > > >
>> > >
>> > > --
>> > > Daniel Vetter
>> > > Software Engineer, Intel Corporation
>> > > http://blog.ffwll.ch
>> > > _______________________________________________
>> > > dri-devel mailing list
>> > > dri-devel@xxxxxxxxxxxxxxxxxxxxx
>> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel