On Fri, Nov 30, 2012 at 4:39 AM, Thomas Hellstrom <thomas@xxxxxxxxxxxx> wrote: > On 11/29/2012 10:58 PM, Marek Olšák wrote: >> >> >> What I tried to point out was that the synchronization shouldn't be >> needed, because the CPU shouldn't do anything with the contents of >> evicted buffers. The GPU moves the buffers, not the CPU. What does the >> CPU do besides updating some kernel structures? >> >> Also, buffer deletion is something where you don't need to wait for >> the buffer to become idle if you know the memory area won't be >> mapped by the CPU, ever. The memory can be reclaimed right away. It >> would be the GPU to move new data in and once that happens, the old >> buffer will be trivially idle, because single-ring GPUs execute >> commands in order. >> >> Marek > > > Actually asynchronous eviction / deletion is something I have been > prototyping for a while but never gotten around to implement in TTM: > > There are a few minor caveats: > > With buffer deletion, what you say is true for fixed memory, but not for TT > memory where pages are reclaimed by the system after buffer destruction. > That means that we don't have to wait for idle to free GPU space, but we > need to wait before pages are handed back to the system. > > Swapout needs to access the contents of evicted buffers, but synchronizing > doesn't need to happen until just before swapout. > > Multi-ring - CPU support: If another ring / engine or the CPU is about to > move in buffer contents to VRAM or a GPU aperture that was previously > evicted by another ring, it needs to sync with that eviction, but doesn't > know what buffer or even which buffers occupied the space previously. > Trivially one can attach a sync object to the memory type manager that > represents the last eviction from that memory type, and *any* engine (CPU or > GPU) that moves buffer contents in needs to order that movement with respect > to that fence. As you say, with a single ring and no CPU fallbacks, that > ordering is a no-op, but any common (non-driver based) implementation needs > to support this. > > A single fence attached to the memory type manager is the simplest solution, > but a solution with a fence for each free region in the free list is also > possible. Then TTM needs a driver callback to be able order fences w r t > echother. > > /Thomas > Radeon already handle multi-ring and ttm interaction with what we call semaphore. Semaphore are created to synchronize with fence accross different ring. I think the easiest solution is to just remove the bo wait in ttm and let driver handle this. Cheers, Jerome _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel