On Fri, 29 May 2020 at 15:36, Alex Deucher <alexdeucher@xxxxxxxxx> wrote: > On Fri, May 29, 2020 at 10:32 AM Daniel Stone <daniel@xxxxxxxxxxxxx> wrote: > > On Fri, 29 May 2020 at 15:29, Alex Deucher <alexdeucher@xxxxxxxxx> wrote: > > > Maybe I'm over thinking this. I just don't want to get into a > > > situation where we go through a lot of effort to add modifier support > > > and then performance ends up being worse than it is today in a lot of > > > cases. > > > > I'm genuinely curious: what do you imagine could cause a worse result? > > As an example, in some cases, it's actually better to use linear for > system memory because it better aligns with pcie access patterns than > some tiling formats (which are better aligned for the memory > controller topology on the dGPU). That said, I haven't been in the > loop as much with the tiling formats on newer GPUs, so that may not be > as much of an issue anymore. Yeah, that makes a lot of sense. On the other hand, placement isn't explicitly encoded for either modifiers or non-modifiers, so I'm not sure how it would really regress. In case it was missed somewhere, there is no generic code doing modifier selection for modifier optimality anywhere. The flow is: - every producer/consumer advertises a list of modifier + format pairs, declaring what they _can_ support - for every use where a buffer needs to be allocated, the generic code intersects these lists of modifiers to determine the set of modifiers mutually acceptable to all consumers - the buffer allocator is always handed a _list_ of modifiers, and makes its own decision based on ?? For a concrete end-to-end example: - KMS declares which modifiers are supported for scanout - EGL declares which modifiers are supported for EGLImage import - Weston determines that one of its clients could be directly scanned out rather than composited - Weston intersects the KMS + EGL set of modifiers to come up with the optimal modifier set (i.e. bypassing composition) - Weston sends this intersected list to the client via the Wayland protocol (mentioned in previous MR) - the client is using EGL, so Mesa receives this list of modifiers, and passes this on to amdgpu - amdgpu uses magic inscrutable heuristics to determine the most optimal modifier to use, and allocates a buffer based on that Weston (or GNOME Shell, or Chromium, or whatever) will never be in a position as a generic client to know that on Raven2 it should use a particular supertiled layout with no DCC if width > 2048. So we designed the entire framework to explicitly avoid generic code trying to reason about the performance properties of specific modifiers. What Weston _does_ know, however, is that display controller can work with modifier set A, and the GPU can work with modifier set B, and if the client can pick something from modifier set A, then there is a much greater probability that Weston can leave the GPU alone so it can be entirely used by the client. It also knows that if the surface can't be directly scanned out for whatever reason, then there's no point in the client optimising for direct scanout, and it can tell the client to select based on optimality purely for the GPU. So that's the thinking behind the interface: that the driver still has exactly as much control and ability to use magic heuristics as it always has, but that system components can supplement the driver's heuristics with their own knowledge, to increase the chance that the driver's heuristics arrive at a configuration that a) will definitely work, and b) have a much greater chance of working optimally. Does that help at all? Cheers, Daniel _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel