On Wed, Aug 19, 2015 at 10:08 AM, Jerome Glisse <jglisse@xxxxxxxxxx> wrote: > On Wed, Aug 19, 2015 at 03:53:44PM +0200, Tobias Jakobi wrote: >> Adding Jérôme to Cc. I think he looked the userptr code before, so maybe >> he has some idea what is going wrong here. >> >> I also had a look at the code, but my knowledge about the DMA API is >> almost nonexistant. However I can see that before doing any DMA via the >> G2D on the buffer the code calls dma_map() on it, and also unmaps it >> when the commandlist is finished. >> >> >> With best wishes, >> Tobias >> >> >> Tobias Jakobi wrote: >> > Thanks Lucas for the explanation! >> > >> > >> > Lucas Stach wrote: >> >> Hi Tobias, >> >> >> >> Am Sonntag, den 16.08.2015, 14:48 +0200 schrieb Tobias Jakobi: >> >>> Hello, >> >>> >> >>> some time ago I checked whether I could use the userptr functionality to >> >>> do zero-copy from userspace allocated buffers via the G2D. This didn't >> >>> work out so well, so kinda put this to the bottom of my TODO list. >> >>> >> >>> Now that IOMMU support has landed and Jan Kara has rewrote page pinning >> >>> using frame vectors (see [1]) I gave userptr another try. >> >>> >> >>> The results are much better. I'm not experiencing any kernel lockups or >> >>> sysmmu pagefaults anymore. However the image now suffers from visual >> >>> artifacts. These images show the nature of the artifacts: >> >>> http://i.imgur.com/nzT6g3Y.jpg >> >>> http://i.imgur.com/wkuYI6X.jpg >> >>> >> >>> The corruption always manifests itself in these pixel lines of fixed >> >>> size and wrong color. >> >>> >> >>> I have written a testcase as part of libdrm for this issue: >> >>> https://github.com/tobiasjakobi/libdrm/commit/db8bf6844436598251f67a71fc334b929bfb2b71 >> >>> >> >>> It allocates N (N an even number) buffers which are aligned to the >> >>> system pagesize. Then it does this each iteration: >> >>> 1) Fill the first N/2 buffers with random data >> >>> 2) Copy the first half to the second half of the buffers >> >>> 3) memcmp() first and second half (verification pass) >> >>> >> >>> Usually this verification already fails on the first iteration. An >> >>> interesting observation is that increasing (!) the buffer size (so the >> >>> amount of pixels that have to copied per buffer grows) makes this issue >> >>> less likely to happen. >> >>> >> >>> With the default 512x512 buffers however it happens, like I said above, >> >>> almost immediately. >> >>> >> >> This is obviously a cache flush missing. The memory you get from >> >> userspace is normal cached memory, so to make it visible to the GPU you >> >> need to flush parts of the cache out to main memory. >> >> >> >> The corruption you are seeing is just unflushed cachelines. This also >> >> explains why increasing the buffer size helps: the more memory the CPU >> >> touches the more cachelines will be flushed out to be replaced with new >> >> data. >> > I should point out that the snapshots I uploaded were done with a >> > different setup. There only the source memory of the G2D operation is a >> > userspace allocated buffer. The destination is a GEM buffer allocated >> > through libdrm, which is then used as framebuffer. So the issue already >> > appears when just the source is userspace allocated. >> > > > This is still consistent with cachelines issue. Is your GPU & IOMMU cache > coherent with the CPU ? If not then it means you need to cache flush the > buffer before you use it with the GPU. The dma API provide few helpers for > that. although I suspect dma-api probably not aware of any device caches (and I suspect a bit weak when it comes to devices that support mix of coherent and non-coherent mappings).. BR, -R > Cheers, > Jérôme > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > http://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel