On Fri, Nov 30, 2012 at 12:08 PM, Thomas Hellstrom <thomas@xxxxxxxxxxxx> wrote: > On 11/30/2012 05:30 PM, Jerome Glisse wrote: >> >> On Fri, Nov 30, 2012 at 4:39 AM, Thomas Hellstrom <thomas@xxxxxxxxxxxx> >> wrote: >>> >>> On 11/29/2012 10:58 PM, Marek Olšák wrote: >>>> >>>> >>>> What I tried to point out was that the synchronization shouldn't be >>>> needed, because the CPU shouldn't do anything with the contents of >>>> evicted buffers. The GPU moves the buffers, not the CPU. What does the >>>> CPU do besides updating some kernel structures? >>>> >>>> Also, buffer deletion is something where you don't need to wait for >>>> the buffer to become idle if you know the memory area won't be >>>> mapped by the CPU, ever. The memory can be reclaimed right away. It >>>> would be the GPU to move new data in and once that happens, the old >>>> buffer will be trivially idle, because single-ring GPUs execute >>>> commands in order. >>>> >>>> Marek >>> >>> >>> Actually asynchronous eviction / deletion is something I have been >>> prototyping for a while but never gotten around to implement in TTM: >>> >>> There are a few minor caveats: >>> >>> With buffer deletion, what you say is true for fixed memory, but not for >>> TT >>> memory where pages are reclaimed by the system after buffer destruction. >>> That means that we don't have to wait for idle to free GPU space, but we >>> need to wait before pages are handed back to the system. >>> >>> Swapout needs to access the contents of evicted buffers, but >>> synchronizing >>> doesn't need to happen until just before swapout. >>> >>> Multi-ring - CPU support: If another ring / engine or the CPU is about to >>> move in buffer contents to VRAM or a GPU aperture that was previously >>> evicted by another ring, it needs to sync with that eviction, but doesn't >>> know what buffer or even which buffers occupied the space previously. >>> Trivially one can attach a sync object to the memory type manager that >>> represents the last eviction from that memory type, and *any* engine (CPU >>> or >>> GPU) that moves buffer contents in needs to order that movement with >>> respect >>> to that fence. As you say, with a single ring and no CPU fallbacks, that >>> ordering is a no-op, but any common (non-driver based) implementation >>> needs >>> to support this. >>> >>> A single fence attached to the memory type manager is the simplest >>> solution, >>> but a solution with a fence for each free region in the free list is also >>> possible. Then TTM needs a driver callback to be able order fences w r t >>> echother. >>> >>> /Thomas >>> >> Radeon already handle multi-ring and ttm interaction with what we call >> semaphore. Semaphore are created to synchronize with fence accross >> different ring. I think the easiest solution is to just remove the bo >> wait in ttm and let driver handle this. > > > The wait can be removed, but only conditioned on a driver flag that says it > supports unsynchronous buffer moves. > > The multi-ring case I'm talking about is: > > Ring 1 evicts buffer A, emits fence 0 > Ring 2 evicts buffer B, emits fence 1 > ..Other eviction takes place by various rings, perhaps including ring 1 and > ring 2. > Ring 3 moves buffer C into the space which happens bo be the union of the > space prevously occupied buffer A and buffer B. > > Question is: which fence do you want to order this move with? > The answer is whichever of fence 0 and 1 signals last. > > I think it's a reasonable thing for TTM to keep track of this, but in order > to do so it needs a driver callback that > can order two fences, and can order a job in the current ring w r t a fence. > In radeon's case that driver callback > would probably insert a barrier / semaphore. In the case of simpler hardware > it would wait on one of the fences. > > /Thomas > I don't think we can order fence easily with a clean api, i would rather see ttm provide a list of fence to driver and tell to the driver before moving this object all the fence on this list need to be completed. I think it's as easy as associating fence with drm_mm (well nouveau as its own mm stuff) but idea would basicly be that fence are both associated with bo and with mm object so you know when a segment of memory is idle/available for use. Cheers, Jerome _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel