On Thu, Aug 20, 2015 at 3:27 PM, Thomas Hellstrom <thomas@xxxxxxxxxxxx> wrote: > On 08/20/2015 04:33 PM, Rob Clark wrote: >> On Thu, Aug 20, 2015 at 2:48 AM, Thomas Hellstrom <thellstrom@xxxxxxxxxx> wrote: >>> Hi, Tiago! >>> >>> On 08/20/2015 12:33 AM, Tiago Vignatti wrote: >>>> Hey Thomas, you haven't answered my email about making SYNC_* mandatory: >>>> >>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/088376.html >>> Hmm, for some reason it doesn't show up in my mail app, but I found it >>> in the archives. An attempt to explain the situation from the vmwgfx >>> perspective. >>> >>> The fact that the interface is generic means that people will start >>> using it for the zero-copy case. There has been a couple of more or less >>> hackish attempts to do this before, and if it's a _driver_ interface we >>> don't need to be that careful but if it is a _generic_ interface we need >>> to be very careful to make it fit *all* the hardware out there and that >>> we make all potential users use the interface in a way that conforms >>> with the interface specification. >>> >>> What will happen otherwise is that apps written for coherent fast >>> hardware might, for example, ignore calling the SYNC api, just because >>> the app writer only cared about his own hardware on which the app works >>> fine. That would fail miserably if the same app was run on incoherent >>> hardware, or the incoherent hardware driver maintainers would be forced >>> to base an implementation on page-faults which would be very slow. >>> >>> So assume the following use case: An app updates a 10x10 area using the >>> CPU on a 1600x1200 dma-buf, and it will then use the dma-buf for >>> texturing. On some hardware the dma-buf might be tiled in a very >>> specific way, on vmwgfx the dma-buf is a GPU buffer on the host, only >>> accessible using DMA. On vmwgfx the SYNC operation must carry out a >>> 10x10 DMA from the host GPU buffer to a guest CPU buffer before the CPU >>> write and a DMA back again after the write, before GPU usage. On the >>> tiled architecture the SYNC operation must untile before CPU access and >>> probably tile again before GPU access. >>> >>> If we now have a one-dimensional SYNC api, in this particular case we'd >>> either need to sync a far too large area (1600x10) or call SYNC 10 times >>> before writing, and then again after writing. If the app forgot to call >>> SYNC we must error. >> just curious, but couldn't you batch up the 10 10x1 sync's? > > Yes that would work up to the first CPU access. Subsequent syncs would > need to be carried out immediately or all ptes would need to be unmapped > to detect the next CPU access. Write only syncs could probably be > batched unconditionally. hmm, maybe another cpu barrier ioctl? I mean if we had a 2d sync API, then needed to update layers in a 3d or cubemap texture, then you need to do multiple 2d updates.. but what about instead having something like: ioctl(SYNC) ioctl(SYNC) ioctl(SYNC) ioctl(PREP) ... cpu access ioctl(FINI) (or something roughly like that) BR, -R > /Thomas > > > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > http://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel