On Wed, 3 Jul 2019 15:50:08 +0100 Steven Price <steven.price@xxxxxxx> wrote: > On 03/07/2019 15:33, Boris Brezillon wrote: > > On Wed, 3 Jul 2019 15:13:25 +0100 > > Steven Price <steven.price@xxxxxxx> wrote: > > > >> On 03/07/2019 14:56, Boris Brezillon wrote: > >>> On Wed, 3 Jul 2019 07:45:32 -0600 > >>> Rob Herring <robh+dt@xxxxxxxxxx> wrote: > >>> > >>>> On Wed, Jul 3, 2019 at 7:34 AM Boris Brezillon > >>>> <boris.brezillon@xxxxxxxxxxxxx> wrote: > >>>>> > >>>>> Exported BOs might be imported back, then mmap()-ed to be written > >>>>> too. Most drivers handle that by mmap()-ing the GEM handle after it's > >>>>> been imported, but, according to [1], this is illegal. > >>>> > >>>> It's not illegal, but is supposed to go thru the dmabuf mmap > >>>> functions. > >>> > >>> That's basically what I'm proposing here, just didn't post the patch > >>> skipping the GET_OFFSET step and doing the mmap() on the dmabuf FD > >>> instead of the DRM-node one, but I have it working for panfrost. > >> > >> If we want to we could make the Panfrost kernel driver internally call > >> dma_buf_mmap() so that mapping using the DRM-node "just works". This is > >> indeed what the kbase driver does. > > > > Well, userspace should at least skip DRM_IOCTL_PANFROST_MMAP_BO (or > > ignore its return code), so calling mmap() on the dmabuf FD instead of > > the DRM-node FD shouldn't be that hard. > > What I was suggesting is that user space would still call > DRM_IOCTL_PANFROST_MMAP_BO to get an offset which uses in a call to > mmap(..., drm_node_fd, offset). The kernel could detect that the buffer > is imported and call the exporter for the actual mmap() functionality. Oops, sorry, brain fart. I thought it was DRM_IOCTL_PANFROST_MMAP_BO that was failing, but it's actually the mmap() call, so providing this wrapper kernel-side should work. > > The alternative is that user space 'simply' remembers that a buffer is > imported and keeps the file descriptor around so that it can instead > directly mmap() the dma_buf fd. Which is certainly easiest from the > kernel's perspective (and was what I assumed panfrost was doing - I > should have checked more closely!). > > >>>> However, none of the driver I've looked at (etnaviv, msm, > >>>> v3d, vgem) do that. It probably works because it's the same driver > >>>> doing the import and export or both drivers have essentially the same > >>>> implementations. > >>> > >>> Yes, but maybe that's something we should start fixing if mmap()-ing > >>> the dmabuf is the recommended solution. > >> > >> I'm open to options here. User space calling mmap() on the dma_buf file > >> descriptor should always be safe (the exporter can do whatever is > >> necessary to make it work). If that happens then the patches I posted > >> close off the DRM node version which could be broken if the exporter > >> needs to do anything to prepare the buffer for CPU access (i.e. cache > >> maintenance). > > > > Talking about CPU <-> GPU syncs, I was wondering if the > > mmap(gem_handle) step was providing any guarantee that would > > allow us to ignore all the cache maintenance operations that are > > required when mmap()-ing a dmabuf directly. Note that in both cases the > > dmabuf is imported. > > In theory the exporter should do whatever is required to ensure that the > CPU is synchronised when a user space mapping exists. There are some > issues here though: > > * In theory the kernel driver should map the dma_buf purely for the > duration that a job is using the buffer (and unmap immediately after). > This gives the exporter the knowledge of when the GPU is using the > memory and allows the exporter to page out of the memory if necessary. > In practise this map/unmap operation is expensive (updating the GPU's > page tables) so most drivers don't actually bother and keep the memory > mapped. This means the exporter cannot tell when the buffer is used or > move the pages. > > * The CPU mappings can be faulted on demand (performing the necessary > CPU cache invalidate if needed) and shot-down to allow moving the > memory. In theory when the GPU needs the memory it should map the buffer > and the exporter can then shoot down the mappings, perform the CPU cache > clean and then allow the GPU to use the memory. A subsequent CPU access > would then refault the page, ensuring a CPU cache invalidate so the > latest data is visible. > > * The majority of exporters are simple and deal with uncached memory > (e.g. frame buffers) or are actually exporting back to the same driver > (e.g. window surfaces). In these situations either the driver already > has the necessary "magic" to deal with caches (e.g. kbase provides > explicit cache maintenance operations), or it's "uncached" anyway so it > doesn't matter. This means that hardly anyone tests with the complex > cases... > > From a user space ABI, my understanding is that a dma_buf mmap() mapping > should be coherent, and user space isn't expected to do anything to make > it work. Obviously any importing device might have it's own coherency > details which will be up to the ABI of that device (e.g. Mali has caches > which may need to be flushed - this is usually done at the start/end of > a job chain, so coherency is not guaranteed while the job chain is running). Thanks for the detailed explanation. _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel