On 08/20/2015 09:27 PM, Thomas Hellstrom wrote: > On 08/20/2015 04:33 PM, Rob Clark wrote: >> On Thu, Aug 20, 2015 at 2:48 AM, Thomas Hellstrom <thellstrom@xxxxxxxxxx> wrote: >>> Hi, Tiago! >>> >>> On 08/20/2015 12:33 AM, Tiago Vignatti wrote: >>>> Hey Thomas, you haven't answered my email about making SYNC_* mandatory: >>>> >>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/088376.html >>> Hmm, for some reason it doesn't show up in my mail app, but I found it >>> in the archives. An attempt to explain the situation from the vmwgfx >>> perspective. >>> >>> The fact that the interface is generic means that people will start >>> using it for the zero-copy case. There has been a couple of more or less >>> hackish attempts to do this before, and if it's a _driver_ interface we >>> don't need to be that careful but if it is a _generic_ interface we need >>> to be very careful to make it fit *all* the hardware out there and that >>> we make all potential users use the interface in a way that conforms >>> with the interface specification. >>> >>> What will happen otherwise is that apps written for coherent fast >>> hardware might, for example, ignore calling the SYNC api, just because >>> the app writer only cared about his own hardware on which the app works >>> fine. That would fail miserably if the same app was run on incoherent >>> hardware, or the incoherent hardware driver maintainers would be forced >>> to base an implementation on page-faults which would be very slow. >>> >>> So assume the following use case: An app updates a 10x10 area using the >>> CPU on a 1600x1200 dma-buf, and it will then use the dma-buf for >>> texturing. On some hardware the dma-buf might be tiled in a very >>> specific way, on vmwgfx the dma-buf is a GPU buffer on the host, only >>> accessible using DMA. On vmwgfx the SYNC operation must carry out a >>> 10x10 DMA from the host GPU buffer to a guest CPU buffer before the CPU >>> write and a DMA back again after the write, before GPU usage. On the >>> tiled architecture the SYNC operation must untile before CPU access and >>> probably tile again before GPU access. >>> >>> If we now have a one-dimensional SYNC api, in this particular case we'd >>> either need to sync a far too large area (1600x10) or call SYNC 10 times >>> before writing, and then again after writing. If the app forgot to call >>> SYNC we must error. >> just curious, but couldn't you batch up the 10 10x1 sync's? > Yes that would work up to the first CPU access. Subsequent syncs would > need to be carried out immediately or all ptes would need to be unmapped > to detect the next CPU access. Write only syncs could probably be > batched unconditionally. > > /Thomas But aside from the problem of subsequent syncs after first CPU access, does user-space really want to call sync for each line? Probably not, but that's a problem that can be postponed (2D sync getting a separate IOCTL) until someone gets tired of calling 1D syncs. My feeling is, however that that will happen rather quickly and at least 2D syncs will be a common usecase. /Thomas > > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > http://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel