Am Montag, dem 15.02.2021 um 10:34 +0100 schrieb Christian König: > > Am 15.02.21 um 10:06 schrieb Simon Ser: > > On Monday, February 15th, 2021 at 9:58 AM, Christian König <christian.koenig@xxxxxxx> wrote: > > > > > we are currently working an Freesync and direct scan out from system > > > memory on AMD APUs in A+A laptops. > > > > > > On problem we stumbled over is that our display hardware needs to scan > > > out from uncached system memory and we currently don't have a way to > > > communicate that through DMA-buf. > > > > > > For our specific use case at hand we are going to implement something > > > driver specific, but the question is should we have something more > > > generic for this? > > > > > > After all the system memory access pattern is a PCIe extension and as > > > such something generic. > > Intel also needs uncached system memory if I'm not mistaken? > > No idea, that's why I'm asking. Could be that this is also interesting > for I+A systems. > > > Where are the buffers allocated? If GBM, then it needs to allocate memory that > > can be scanned out if the USE_SCANOUT flag is set or if a scanout-capable > > modifier is picked. > > > > If this is about communicating buffer constraints between different components > > of the stack, there were a few proposals about it. The most recent one is [1]. > > Well the problem here is on a different level of the stack. > > See resolution, pitch etc:.. can easily communicated in userspace > without involvement of the kernel. The worst thing which can happen is > that you draw garbage into your own application window. > > But if you get the caching attributes in the page tables (both CPU as > well as IOMMU, device etc...) wrong then ARM for example has the > tendency to just spontaneously reboot > > X86 is fortunately a bit more gracefully and you only end up with random > data corruption, but that is only marginally better. > > So to sum it up that is not something which we can leave in the hands of > userspace. > > I think that exporters in the DMA-buf framework should have the ability > to tell importers if the system memory snooping is necessary or not. There is already a coarse-grained way to do so: the dma_coherent property in struct device, which you can check at dmabuf attach time. However it may not be enough for the requirements of a GPU where the engines could differ in their dma coherency requirements. For that you need to either have fake struct devices for the individual engines or come up with a more fine-grained way to communicate those requirements. > Userspace components can then of course tell the exporter what the > importer needs, but validation if that stuff is correct and doesn't > crash the system must happen in the kernel. What exactly do you mean by "scanout requires non-coherent memory"? Does the scanout requestor always set the no-snoop PCI flag, so you get garbage if some writes to memory are still stuck in the caches, or is it some other requirement? Regards, Lucas