Re: Asynchronous eviction [WAS Re: [PATCH] drm/ttm: add minimum residency constraint for bo eviction]

Jerome Glisse <j.glisse@xxxxxxxxx> · Fri, 30 Nov 2012 12:18:21 -0500

On Fri, Nov 30, 2012 at 12:08 PM, Thomas Hellstrom <thomas@xxxxxxxxxxxx> wrote:
> On 11/30/2012 05:30 PM, Jerome Glisse wrote:
>>
>> On Fri, Nov 30, 2012 at 4:39 AM, Thomas Hellstrom <thomas@xxxxxxxxxxxx>
>> wrote:
>>>
>>> On 11/29/2012 10:58 PM, Marek Olšák wrote:
>>>>
>>>>
>>>> What I tried to point out was that the synchronization shouldn't be
>>>> needed, because the CPU shouldn't do anything with the contents of
>>>> evicted buffers. The GPU moves the buffers, not the CPU. What does the
>>>> CPU do besides updating some kernel structures?
>>>>
>>>> Also, buffer deletion is something where you don't need to wait for
>>>> the buffer to become idle if you know the memory area won't be
>>>> mapped by the CPU, ever. The memory can be reclaimed right away. It
>>>> would be the GPU to move new data in and once that happens, the old
>>>> buffer will be trivially idle, because single-ring GPUs execute
>>>> commands in order.
>>>>
>>>> Marek
>>>
>>>
>>> Actually asynchronous eviction / deletion is something I have been
>>> prototyping for a while but never gotten around to implement in TTM:
>>>
>>> There are a few minor caveats:
>>>
>>> With buffer deletion, what you say is true for fixed memory, but not for
>>> TT
>>> memory where pages are reclaimed by the system after buffer destruction.
>>> That means that we don't have to wait for idle to free GPU space, but we
>>> need to wait before pages are handed back to the system.
>>>
>>> Swapout needs to access the contents of evicted buffers, but
>>> synchronizing
>>> doesn't need to happen until just before swapout.
>>>
>>> Multi-ring - CPU support: If another ring / engine or the CPU is about to
>>> move in buffer contents to VRAM or a GPU aperture that was previously
>>> evicted by another ring, it needs to sync with that eviction, but doesn't
>>> know what buffer or even which buffers occupied the space previously.
>>> Trivially one can attach a sync object to the memory type manager that
>>> represents the last eviction from that memory type, and *any* engine (CPU
>>> or
>>> GPU) that moves buffer contents in needs to order that movement with
>>> respect
>>> to that fence. As you say, with a single ring and no CPU fallbacks, that
>>> ordering is a no-op, but any common (non-driver based) implementation
>>> needs
>>> to support this.
>>>
>>> A single fence attached to the memory type manager is the simplest
>>> solution,
>>> but a solution with a fence for each free region in the free list is also
>>> possible. Then TTM needs a driver callback to be able order fences w r t
>>> echother.
>>>
>>> /Thomas
>>>
>> Radeon already handle multi-ring and ttm interaction with what we call
>> semaphore. Semaphore are created to synchronize with fence accross
>> different ring. I think the easiest solution is to just remove the bo
>> wait in ttm and let driver handle this.
>
>
> The wait can be removed, but only conditioned on a driver flag that says it
> supports unsynchronous buffer moves.
>
> The multi-ring case I'm talking about is:
>
> Ring 1 evicts buffer A, emits fence 0
> Ring 2 evicts buffer B, emits fence 1
> ..Other eviction takes place by various rings, perhaps including ring 1 and
> ring 2.
> Ring 3 moves buffer C into the space which happens bo be the union of the
> space prevously occupied buffer A and buffer B.
>
> Question is: which fence do you want to order this move with?
> The answer is whichever of fence 0 and 1 signals last.
>
> I think it's a reasonable thing for TTM to keep track of this, but in order
> to do so it needs a driver callback that
> can order two fences, and can order a job in the current ring w r t a fence.
> In radeon's case that driver callback
> would probably insert a barrier / semaphore. In the case of simpler hardware
> it would wait on one of the fences.
>
> /Thomas
>

I don't think we can order fence easily with a clean api, i would
rather see ttm provide a list of fence to driver and tell to the
driver before moving this object all the fence on this list need to be
completed. I think it's as easy as associating fence with drm_mm (well
nouveau as its own mm stuff) but idea would basicly be that fence are
both associated with bo and with mm object so you know when a segment
of memory is idle/available for use.

Cheers,
Jerome
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel