Re: Asynchronous eviction [WAS Re: [PATCH] drm/ttm: add minimum residency constraint for bo eviction]

Jerome Glisse <j.glisse@xxxxxxxxx> · Fri, 30 Nov 2012 11:30:42 -0500

On Fri, Nov 30, 2012 at 4:39 AM, Thomas Hellstrom <thomas@xxxxxxxxxxxx> wrote:
> On 11/29/2012 10:58 PM, Marek Olšák wrote:
>>
>>
>> What I tried to point out was that the synchronization shouldn't be
>> needed, because the CPU shouldn't do anything with the contents of
>> evicted buffers. The GPU moves the buffers, not the CPU. What does the
>> CPU do besides updating some kernel structures?
>>
>> Also, buffer deletion is something where you don't need to wait for
>> the buffer to become idle if you know the memory area won't be
>> mapped by the CPU, ever. The memory can be reclaimed right away. It
>> would be the GPU to move new data in and once that happens, the old
>> buffer will be trivially idle, because single-ring GPUs execute
>> commands in order.
>>
>> Marek
>
>
> Actually asynchronous eviction / deletion is something I have been
> prototyping for a while but never gotten around to implement in TTM:
>
> There are a few minor caveats:
>
> With buffer deletion, what you say is true for fixed memory, but not for TT
> memory where pages are reclaimed by the system after buffer destruction.
> That means that we don't have to wait for idle to free GPU space, but we
> need to wait before pages are handed back to the system.
>
> Swapout needs to access the contents of evicted buffers, but synchronizing
> doesn't need to happen until just before swapout.
>
> Multi-ring - CPU support: If another ring / engine or the CPU is about to
> move in buffer contents to VRAM or a GPU aperture that was previously
> evicted by another ring, it needs to sync with that eviction, but doesn't
> know what buffer or even which buffers occupied the space previously.
> Trivially one can attach a sync object to the memory type manager that
> represents the last eviction from that memory type, and *any* engine (CPU or
> GPU) that moves buffer contents in needs to order that movement with respect
> to that fence. As you say, with a single ring and no CPU fallbacks, that
> ordering is a no-op, but any common (non-driver based) implementation needs
> to support this.
>
> A single fence attached to the memory type manager is the simplest solution,
> but a solution with a fence for each free region in the free list is also
> possible. Then TTM needs a driver callback to be able order fences w r t
> echother.
>
> /Thomas
>

Radeon already handle multi-ring and ttm interaction with what we call
semaphore. Semaphore are created to synchronize with fence accross
different ring. I think the easiest solution is to just remove the bo
wait in ttm and let driver handle this.

Cheers,
Jerome
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel