Am Dienstag, den 01.10.2013, 13:13 +0200 schrieb Thomas Hellstrom: > On 10/01/2013 12:34 PM, Lucas Stach wrote: > > Am Dienstag, den 01.10.2013, 12:16 +0200 schrieb Thomas Hellstrom: > >> Jerome, Konrad > >> > >> Forgive an ignorant question, but it appears like both Nouveau and > >> Radeon may use pci_map_page() when populating TTMs on > >> pages obtained using the ordinary (not DMA pool). These pages will, if I > >> understand things correctly, not be pages allocated with > >> DMA_ALLOC_COHERENT. > >> > >> From what I understand, at least for the corresponding dma_map_page() > >> it's illegal for the CPU to access these pages without calling > >> dma_sync_xx_for_cpu(). And before the device is allowed to access them > >> again, you need to call dma_sync_xx_for_device(). > >> So mapping for PCI really invalidates the TTM interleaved CPU / device > >> access model. > >> > > That's right. The API says you need to sync for device or cpu, but on > > x86 you can get away with not doing so, as on x86 the calls end up just > > being WB buffer flushes. > > OK, but what about the cases where the dma subsystem allocates a bounce > buffer? > (Although I think the TTM page selection works around this situation). > Perhaps at the very least this deserves a comment in the code... Not doing the the sync_for_* is always a violation of the dma-mapping API and will rightfully fail on systems relying on those mechanisms to do proper dma memory handling, bounce buffers are just one of those cases. > > > > For ARM, or similar non-coherent arches you absolutely have to do the > > syncs, or you'll end up with different contents in cache vs sysram. For > > my nouveau on ARM work I introduced some simple helpers to do the right > > thing. And it really isn't hard doing the syncs at the right points in > > time, just sync for CPU when getting a cpu_prep ioctl and then sync for > > device when validating a buffer for GPU use. > > Yes, this will probably work for drivers where a buffer is either bound > for CPU or for GPU, > however, on drivers using user-space sub-allocation of buffers, or for > partial updates of > vertex buffers etc. that isn't sufficient. In that case one either has > to use coherent memory > or implement an elaborate scheme where we sync for device and kill > user-space mappings on validation and > sync for cpu in the cpu fault handler. Unfortunately the latter triggers > a fence wait for the > whole buffer, not just the part of the buffer we want to write to. > > Yeah, either you have to use dma coherent memory, or implement some scheme where you only sync subregions of a buffer. Though having to call a cpu_prepare_subbuffer ioctl might just kill all benefits you got from using userspace suballocation. So using coherent mem for those buffers seems like a safe bet. I already implemented some of this in the nouveau nv50 MESA driver which uses userspace suballocation, but unfortunately I can't do any serious performance measurements, as the system setup has other unrelated bottlenecks. Regards, Lucas -- Pengutronix e.K. | Lucas Stach | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel