Re: [PATCH] drm/ttm: add minimum residency constraint for bo eviction

Thomas Hellstrom <thomas@xxxxxxxxxxxx> · Fri, 30 Nov 2012 09:38:41 +0100

On 11/29/2012 10:58 PM, Marek Olšák wrote:
On Thu, Nov 29, 2012 at 9:33 PM, Thomas Hellstrom <thomas@xxxxxxxxxxxx> wrote:
On 11/29/2012 01:52 PM, Marek Olšák wrote:
On Thu, Nov 29, 2012 at 9:04 AM, Thomas Hellstrom <thomas@xxxxxxxxxxxx>
wrote:
On 11/29/2012 03:15 AM, Marek Olšák wrote:
On Thu, Nov 29, 2012 at 12:44 AM, Alan Swanson <swanson@xxxxxxxxx>
wrote:
On Wed, 2012-11-28 at 18:24 -0500, Jerome Glisse wrote:
On Wed, Nov 28, 2012 at 6:18 PM, Thomas Hellstrom
<thomas@xxxxxxxxxxxx>
wrote:
On 11/28/2012 04:58 PM, j.glisse@xxxxxxxxx wrote:
From: Jerome Glisse <jglisse@xxxxxxxxxx>

This patch add a minimum residency time configurable for each memory
pool (VRAM, GTT, ...). Intention is to avoid having a lot of memory
eviction from VRAM up to a point where the GPU pretty much spend all
it's time moving things in and out.

This patch seems odd to me.

It seems the net effect is to refuse evictions from VRAM and make
buffers go
somewhere else, and that makes things faster?

Why don't they go there in the first place instead of trying to force
them
into VRAM,
when VRAM is full?

/Thomas
It's mostly a side effect of cs and validating with each cs, if boA is
in cs1 and not in cs2 and boB is in cs1 but not in cs2 than boA could
be evicted by cs2 and boB moved in, if next cs ie cs3 is like cs1 then
boA move back again and boB is evicted, then you get cs4 which
reference boB but not boA, boA get evicted and boB move in ... So ttm
just spend its time doing eviction but he doing so because it's ask by
the driver to do so. Note that what is costly there is not the bo move
in itself but the page allocation.

I propose this patch to put a boundary on bo eviction frequency, i
thought it might help other driver, if you set the residency time to 0
you get the current behavior, if you don't you enforce a minimum
residency time which helps driver like radeon. Of course a proper fix
to the bo eviction for radeon has to be in radeon code and is mostly
an overhaul of how we validate bo.

But i still believe that this patch has value in itself by allowing
driver to put a boundary on buffer movement frequency.

Cheers,
Jerome
So, a variation on John Carmack's recommendation from 2000 to use MRU,
not LRU, to avoid texture trashing.

     Mar 07, 2000 - Virtualized video card local memory is The Right
Thing.
     http://floodyberry.com/carmack/johnc_plan_2000.html

In fact, this was last discussed in 2005 with a patch for a 1 second
stale texture eviction and I (still) wondered why a method it was never
implemented since it was an clear problem.
BTW we can send end-of-frame markers to the kernel, which could be
used to implement Carmack's algorithm.

Marek

It seems to me like Carmack's algorithm is quite specific to the case
where
only a single GL client is running?
In theory, we could send context IDs to the kernel as well and modify
the conditional to "If the LRU texture was not needed in the previous
frame of any context".

It also seems like it's designed around the fact that when eviction takes
place, all buffer objects will be idle. With a
reasonably filled graphics fifo / ring, blindly using MRU will cause the
GPU
to run synchronized.
I don't see why you would need to synchronize. If the GPU takes care
of moving buffers in and out of VRAM and there's only one ring buffer
==> no synchronization is required.
The LRU bo has a much higher probability of being idle than the MRU bo, and
waiting for it to become idle will in
principle synchronize the GPU and unnecessarily drain the ring.
What I tried to point out was that the synchronization shouldn't be
needed, because the CPU shouldn't do anything with the contents of
evicted buffers. The GPU moves the buffers, not the CPU. What does the
CPU do besides updating some kernel structures?

Also, buffer deletion is something where you don't need to wait for
the buffer to become idle if you know the memory area won't be
mapped by the CPU, ever. The memory can be reclaimed right away. It
would be the GPU to move new data in and once that happens, the old
buffer will be trivially idle, because single-ring GPUs execute
commands in order.

Yes, you're right. Sorry about that.

/Thomas

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel