On Thu, Dec 17, 2015 at 10:58:20PM +0100, Thomas Hellstrom wrote: > On 12/16/2015 11:25 PM, Tiago Vignatti wrote: > > From: Daniel Vetter <daniel.vetter@xxxxxxxx> > > > > The userspace might need some sort of cache coherency management e.g. when CPU > > and GPU domains are being accessed through dma-buf at the same time. To > > circumvent this problem there are begin/end coherency markers, that forward > > directly to existing dma-buf device drivers vfunc hooks. Userspace can make use > > of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be > > used like following: > > - mmap dma-buf fd > > - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write > > to mmap area 3. SYNC_END ioctl. This can be repeated as often as you > > want (with the new data being consumed by the GPU or say scanout device) > > - munmap once you don't need the buffer any more > > > > v2 (Tiago): Fix header file type names (u64 -> __u64) > > v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end > > dma-buf functions. Check for overflows in start/length. > > v4 (Tiago): use 2d regions for sync. > > v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and > > remove range information from struct dma_buf_sync. > > v6 (Tiago): use __u64 structured padded flags instead enum. Adjust > > documentation about the recommendation on using sync ioctls. > > > > Cc: Sumit Semwal <sumit.semwal@xxxxxxxxxx> > > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx> > > Signed-off-by: Tiago Vignatti <tiago.vignatti@xxxxxxxxx> > > --- > > Documentation/dma-buf-sharing.txt | 22 +++++++++++++++++++- > > drivers/dma-buf/dma-buf.c | 43 +++++++++++++++++++++++++++++++++++++++ > > include/uapi/linux/dma-buf.h | 38 ++++++++++++++++++++++++++++++++++ > > 3 files changed, 102 insertions(+), 1 deletion(-) > > create mode 100644 include/uapi/linux/dma-buf.h > > > > diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt > > index 4f4a84b..2ddd4b2 100644 > > --- a/Documentation/dma-buf-sharing.txt > > +++ b/Documentation/dma-buf-sharing.txt > > @@ -350,7 +350,27 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: > > handles, too). So it's beneficial to support this in a similar fashion on > > dma-buf to have a good transition path for existing Android userspace. > > > > - No special interfaces, userspace simply calls mmap on the dma-buf fd. > > + No special interfaces, userspace simply calls mmap on the dma-buf fd. Very > > + important to note though is that, even if it is not mandatory, the userspace > > + is strongly recommended to always use the cache synchronization ioctl > > + (DMA_BUF_IOCTL_SYNC) discussed next. > > + > > + Some systems might need some sort of cache coherency management e.g. when > > + CPU and GPU domains are being accessed through dma-buf at the same time. To > > + circumvent this problem there are begin/end coherency markers, that forward > > + directly to existing dma-buf device drivers vfunc hooks. Userspace can make > > + use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence > > + would be used like following: > > + - mmap dma-buf fd > > + - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write > > + to mmap area 3. SYNC_END ioctl. This can be repeated as often as you > > + want (with the new data being consumed by the GPU or say scanout device) > > + - munmap once you don't need the buffer any more > > + > > + In principle systems with the memory cache shared by the GPU and CPU may > > + not need SYNC_START and SYNC_END but still, userspace is always encouraged > > + to use these ioctls before and after, respectively, when accessing the > > + mapped address. > > > > I think the wording here is far too weak. If this is a generic > user-space interface and syncing > is required for > a) Correctness: then syncing must be mandatory. > b) Optimal performance then an implementation must generate expected > results also in the absence of SYNC ioctls, but is allowed to rely on > correct pairing of SYNC_START and SYNC_END to render correctly. Yeah I guess the wroking should be stronger to make it clear you better call these, always. > Also recalling the discussion of multidimensional sync, which we said we > would add when it was needed, my worst nightmare is when (not if) there > are a number of important applications starting to abuse this interface > for small writes or reads to large DMA buffers. It will work flawlessly > on coherent architectures, and probably some incoherent architectures as > well, but will probably be mostly useless on VMware. What is the plan > for adding 2D sync to make it work? How do we pursuade app developers to > think of damage regions, and can we count on support from the DRM > community when this happens? The current use-case in cros seems to be to do full-blown buffer uploads, nothing partial. I don't think there's anything we can do really (flushing for small uploads will make i915 crawl too, albeit you can still see a slideshow and it's not a complete stop), except maybe improve the docs with a statement that trying to use dma-buf mmap for small uploads will be result in horrible performance. I think inevitable reality is that some truly horrible software will always ship. I am pretty hopeful though that this won't be too common, since the main users for anything dma-buf are on mobile, and those systems (at least on i915) are all incoherent. So as long as it'll run acceptably on i915 I think it should run at least ok-ish on vmwgfx too. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel