On Tue, 2024-05-14 at 12:25 +0300, Joonas Lahtinen wrote: > Quoting Kenneth Graunke (2024-05-11 03:58:34) > > On Tuesday, May 7, 2024 3:56:57 PM PDT Matt Roper wrote: > > > On Mon, May 06, 2024 at 09:52:35PM +0300, Juha-Pekka Heikkila > > > wrote: > > > > These patches introduce I915_FORMAT_MOD_4_TILED_XE2_CCS > > > > modifier, which, > > > > from the kernel's perspective, behaves similarly to > > `I915_FORMAT_MOD_4_TILED`. > > > > This new modifier is primarily intended for user space to > > > > effectively > > monitor > > > > compression status, especially when dealing with a mix of > > > > compressed and > > > > uncompressed buffers. > > > > > > > > The addition of this modifier facilitates user space in > > > > managing > > compression > > > > status, particularly when utilizing both compressed and > > > > uncompressed > > buffers > > > > concurrently. To leverage compression for these buffers, user > > > > space > > > > applications must configure the appropriate Page Attribute > > > > Table (PAT) > > index. > > > > Display engine will treat all Tile4 as if it were compressed > > > > under all > > > > circumstances on Xe2 architecture. > > > > > > I may have missed some discussion about this, but I thought the > > > previous > > > consensus was that we didn't want/need new modifiers for > > > compression on > > > Xe2? If a userspace client (or the display hardware) receives a > > > buffer > > > of unknown origin and unknown compression status, it's always > > > fine to > > > select a compressed PAT when binding the buffer to read since > > > even for > > > uncompressed buffers the CCS metadata will accurately reflect the > > > compression status. Unlike Xe1, where generating content without > > > compression enabled would leave random garbage in the FlatCCS > > > area, Xe2 > > > will set the corresponding FlatCCS to '0x0' for each block, > > > indicating > > > uncompressed data. > > > > > > Can you explain more what the benefit of handling these modifiers > > > explicitly is? > > > > > > > > > Matt > > > > Thanks, Matt! I'm a bit late in getting up to speed with the Xe2 > > compression > > changes; this is really good information. > > > > As I understand it...all blocks on the GPU behave in the way you > > mentioned, > > where generating uncompressed data via the GPU will set FlatCCS = > > 0, so you > > can assume a compressed PAT entry and everything works. > > > > One snag is...I've heard that CPU access doesn't work that way. > > So, if you > > mmap a buffer on the CPU, and write data with the CPU, then I think > > we're back > > to the "FlatCCS contains uninitialized garbage" case, where it's > > unsafe to > > assume a compressed PAT. And... we don't really know when sharing > > buffers > > whether the other side is going to want to do CPU access. > > I think the previous discussion has specifically happened in the > context of > dma-buf, so not only CPU but other GPUs/accelerators/decoders/devices > in the > system are also relevant. > > It boils down to the fact that when exporting a dma-buf, one can't > know it will > be consumed only by the same GPU (or other device for that matter) > unless there > is an explicit negotiation between exporter and importers. > > > It would be really nice to assume compression by default, though, > > which got me > > thinking: if we mmap a buffer via DRM_XE_GEM_MMAP_OFFSET, could > > xe.ko disable > > compression for us? So, resolve any outstanding CCS data, and then > > switch any > > PAT entries to uncompressed. Mapping would block until that > > resolve is done. > > It could leave compression off forever (once you CPU map a buffer, > > it's never > > compressed again). Or it could turn CCS back on when map count > > reaches 0 (but > > frankly I'm not sure that's terribly important, and sounds more > > complex). > > This would only really work for a single device but the dma-buf is > specifically targeting more generic sharing. There's no built-in > mechanism > to limit the sharing to subset of devices without explicit > negotiation > between the exporter and importers. > > So I think the "by default" mode needs to be interoperable, and the > explicit negotiation can then use less compatible formats given those > FD > are never passed to importers that were not part of the negotiation. > > > As I understand it, at least on discrete GPUs, the kernel already > > has to do > > something similar for eviction, when migrating BOs to system memory > > (which > > doesn't support compression). So this would be doing basically the > > same > > "resolve and disable CCS" step the kernel can presumably already > > do, but now > > on mmap as well. > > So far the eviction strategy has been to copy both the backing store > and > compression bits in raw form. With Xe2 it would indeed be possible to > do > an implicit resolve IFF the buffer has not been shared to someone who > doesn't > understand compression and might have left garbage in the CCS bits. > > When evicting in raw form, kernel doesn't have to know if the CCS > bits > are garbage or not on any given moment. > > Regards, Joonas Just a follow-up comment (TBC), IF we're going for the everything-is- compressed approach, I think there are some considerations to be made: dma-buf exports to foreign devices need to resolve at map_attachment time. Foreign devices are all devices that can't interpret the compressed content. dma-buf mmaps need to resolve. IIRC Could be implemented in the DMA_BUF_IOCTL_SYNC callbacks that wrap cpu access. dma-buf imports from foreign devices need to never use compressed PAT for writing. Should KMD enforce this? Implicitly? Explicitly? I don't see how UMD could know whether the imported dma-buf is from a foreign device. For mmaps of buffer objects of local device a resolve is needed. Having KMD do that on a pagefault-basis is definitely possible, but will most likely be terribly inefficient. Better to leave that to UMD? /Thomas > > > > > What do you think? Viable? Crazy? Have I missed something? > > > > --Ken