On 03/07/2019 17:18, Daniel Vetter wrote: > On Wed, Jul 3, 2019 at 6:11 PM Steven Price <steven.price@xxxxxxx> wrote: [...] >> In theory the exporter should do whatever is required to ensure that the >> CPU is synchronised when a user space mapping exists. There are some >> issues here though: >> >> * In theory the kernel driver should map the dma_buf purely for the >> duration that a job is using the buffer (and unmap immediately after). >> This gives the exporter the knowledge of when the GPU is using the >> memory and allows the exporter to page out of the memory if necessary. >> In practise this map/unmap operation is expensive (updating the GPU's >> page tables) so most drivers don't actually bother and keep the memory >> mapped. This means the exporter cannot tell when the buffer is used or >> move the pages. > > Yeah refaulting is too slow if you care the least about performance. > >> * The CPU mappings can be faulted on demand (performing the necessary >> CPU cache invalidate if needed) and shot-down to allow moving the >> memory. In theory when the GPU needs the memory it should map the buffer >> and the exporter can then shoot down the mappings, perform the CPU cache >> clean and then allow the GPU to use the memory. A subsequent CPU access >> would then refault the page, ensuring a CPU cache invalidate so the >> latest data is visible. > > We thought that was the answer, until it was clear its not. dma-buf > mmap isn't coherent, you need to call the begin/end ioctls. > >> * The majority of exporters are simple and deal with uncached memory >> (e.g. frame buffers) or are actually exporting back to the same driver >> (e.g. window surfaces). In these situations either the driver already >> has the necessary "magic" to deal with caches (e.g. kbase provides >> explicit cache maintenance operations), or it's "uncached" anyway so it >> doesn't matter. This means that hardly anyone tests with the complex >> cases... >> >> From a user space ABI, my understanding is that a dma_buf mmap() mapping >> should be coherent, and user space isn't expected to do anything to make >> it work. Obviously any importing device might have it's own coherency >> details which will be up to the ABI of that device (e.g. Mali has caches >> which may need to be flushed - this is usually done at the start/end of >> a job chain, so coherency is not guaranteed while the job chain is running). > > See my other reply, but this isn't correct. dma-buf has explicit cache > maintenance ops. It's just that generally everyone (ok, display only > drivers using the cma helpers) ends up exporting coherent memory and > that's why this works. Doesn't make it a bright idea imo ... Sorry, I'd completely forgotten about the DMA_BUF_IOCTL_SYNC ioctl when I wrote this. But to a large extent this ship has already sailed, and indeed the current users of dma_buf_mmap() implicitly assume that no sync is necessary (since there's no mechanism to forward the syncs onto the exporter). Indeed the comment in dma-buf.c says: * The assumption in the current dma-buf interfaces is that redirecting the * initial mmap is all that's needed. A survey of some of the existing * subsystems shows that no driver seems to do any nefarious thing like * syncing up with outstanding asynchronous processing on the device or * allocating special resources at fault time. So hopefully this is good * enough, since adding interfaces to intercept pagefaults and allow pte * shootdowns would increase the complexity quite a bit. I'd be happy to kill off dma_buf_mmap() and require user space to mmap via the dma_buf handle (and also therefore to handle the manual sync ioctls), but I'm not sure how we achieve that while maintaining backwards compatibility. Thanks, Steve _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel