Re: TTM's role in score-based eviction

Thomas Hellstrom <thellstrom@xxxxxxxxxx> · Mon, 09 Dec 2013 20:32:54 +0100

On 12/09/2013 06:28 PM, Daniel Vetter wrote:
On Thu, Dec 05, 2013 at 11:45:03AM -0500, Jerome Glisse wrote:
On Thu, Dec 05, 2013 at 05:22:54PM +0100, Maarten Lankhorst wrote:
op 05-12-13 16:49, Jerome Glisse schreef:
On Thu, Dec 05, 2013 at 11:26:46AM +0100, Thomas Hellstrom wrote:
Hi!

On 12/05/2013 10:36 AM, Lauri Kasanen wrote:
Hi list, Thomas,

I will be investigating the use of a hotness score for each bo, to
replace the ping-pong causing LRU eviction in radeon*.

The goal is to put all bos that fit in VRAM there, in order of hotness;
a new bo should only be placed there if its hotness score is greater
than the lowest VRAM bo's. Then the lowest-hotness-bos in
VRAM should be evicted until the new bo fits. This should result in a
more stable set with less ping-pong.

Jerome advised that the bo placement should be done entirely outside
TTM. As I'm not (yet) too familiar with that side of the kernel, what is
the opinion of TTM folks?
There are a couple of things to be considered:
1) You need to decide where a bo to be validated should be placed.
The driver can give a list of possible placements to TTM and let
TTM decide, trying each placement in turn. A driver that thinks this
isn't sufficient can come up with its on strategy and give only a
single placement to TTM. If TTM can't satisfy that, it will give you
an error back, and the driver will need to validate with an
alternative placement. I think Radeon already does this? vmwgfx does
it to some extent.

2) As you say, TTM is evicting strictly on an lru basis, and is
maintaining one LRU list per memory type, and also a global swap lru
list for buffers that are backed by system pages (not VRAM). I guess
what you would want to do is to replace the VRAM lru list with a
priority queue where bos are continously sorted based on hotness.
As long as you obey the locking rules:
*) Locking order is bo::reserve -> lru-lock
*) When walking the queue with the lru-lock held, you must therefore
tryreserve if you want to reserve an object on the queue
*) bo:s need to be removed from the queue as soon as they are reserved
*) Don't remove a bo from the queue unless it is reserved
Nothing stops you from doing this in the driver, but OTOH if this
ends up being useful for other drivers I'd prefer we put it into
TTM.
It will be useful to others, the point i am making is that others might
not use ttm either and there is nothing about bo placement that needs
to be ttm specific.

To avoid bo eviction from lru list is just a matter of driver never
over committing bo on a pool of memory and driver doing eviction by
itself, ie deciding of a new placement for bo and moving that bo
before moving in other bo, which can be done outside ttm.

The only thing that will needs modification to ttm is work done to
control memory fragmentation but this should be not be enforce on
all ttm user and should be a runtime decision. GPU with virtual
address space can scatter bo through vram by using vram pages making
memory fragmentation pretty much a non issue (some GPU still needs
contiguous memory for scan out buffer or other specific buffer).

You're correct it COULD be done like that, but that's a nasty workaround.
Simply assign a priority to each buffer, then modify ttm_bo_add_to_lru,
ttm_bo_swapout, ttm_mem_evict_first and be done with it.

Memory management is exactly the kind of thing that should be done in TTM,
so why have something 'generic' for something that's little more than a renamed priority queue?
The end score and use of the score for placement decision be done in ttm
but the whole score computation and heuristic related to it should not.
btw another thing to look at is the eviction roaster in drm_mm. It's
completely standalone, the only thing it requires is that you have a
deterministic order to add objects to it and unroll them (but that can
always be solved by putting objects on a temporary list).

That way if you have some big objects and a highly fragmented vram you
don't end up eviction a big load of data, but just a perfectly-sized hole.
All the scanning is linar, but ime with the implementation in i915.ko
that's not a real-world issue really. The drm_mm roaster supports all the
same features as the normal block allocator, so range-restricted
allocations (and everything else) also works. See evict_something in
i915_gem_eviction.c for how it all works (yeah, no docs but writing those
for drm_mm.c is on my todo somewhere).
-Daniel

The problem with combining this with TTM is that eviction by default 
doesn't take place under a mutex, so multiple threads may be traversing 
the LRU list more or less at the same time, evicting stuff.

However, when it comes to eviction, that's not really a behaviour we 
need to preserve. It would, IMO, be OK to take a "big" per-memory-type 
mutex around eviction, but then one would have to sort out how / whether 
swapping and delayed destruction would need to wait on that mutex as 
well....

/Thomas
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel