On Thu, Dec 05, 2013 at 11:45:03AM -0500, Jerome Glisse wrote: > On Thu, Dec 05, 2013 at 05:22:54PM +0100, Maarten Lankhorst wrote: > > op 05-12-13 16:49, Jerome Glisse schreef: > > > On Thu, Dec 05, 2013 at 11:26:46AM +0100, Thomas Hellstrom wrote: > > >> Hi! > > >> > > >> On 12/05/2013 10:36 AM, Lauri Kasanen wrote: > > >>> Hi list, Thomas, > > >>> > > >>> I will be investigating the use of a hotness score for each bo, to > > >>> replace the ping-pong causing LRU eviction in radeon*. > > >>> > > >>> The goal is to put all bos that fit in VRAM there, in order of hotness; > > >>> a new bo should only be placed there if its hotness score is greater > > >>> than the lowest VRAM bo's. Then the lowest-hotness-bos in > > >>> VRAM should be evicted until the new bo fits. This should result in a > > >>> more stable set with less ping-pong. > > >>> > > >>> Jerome advised that the bo placement should be done entirely outside > > >>> TTM. As I'm not (yet) too familiar with that side of the kernel, what is > > >>> the opinion of TTM folks? > > >> There are a couple of things to be considered: > > >> 1) You need to decide where a bo to be validated should be placed. > > >> The driver can give a list of possible placements to TTM and let > > >> TTM decide, trying each placement in turn. A driver that thinks this > > >> isn't sufficient can come up with its on strategy and give only a > > >> single placement to TTM. If TTM can't satisfy that, it will give you > > >> an error back, and the driver will need to validate with an > > >> alternative placement. I think Radeon already does this? vmwgfx does > > >> it to some extent. > > >> > > >> 2) As you say, TTM is evicting strictly on an lru basis, and is > > >> maintaining one LRU list per memory type, and also a global swap lru > > >> list for buffers that are backed by system pages (not VRAM). I guess > > >> what you would want to do is to replace the VRAM lru list with a > > >> priority queue where bos are continously sorted based on hotness. > > >> As long as you obey the locking rules: > > >> *) Locking order is bo::reserve -> lru-lock > > >> *) When walking the queue with the lru-lock held, you must therefore > > >> tryreserve if you want to reserve an object on the queue > > >> *) bo:s need to be removed from the queue as soon as they are reserved > > >> *) Don't remove a bo from the queue unless it is reserved > > >> Nothing stops you from doing this in the driver, but OTOH if this > > >> ends up being useful for other drivers I'd prefer we put it into > > >> TTM. > > > It will be useful to others, the point i am making is that others might > > > not use ttm either and there is nothing about bo placement that needs > > > to be ttm specific. > > > > > > To avoid bo eviction from lru list is just a matter of driver never > > > over committing bo on a pool of memory and driver doing eviction by > > > itself, ie deciding of a new placement for bo and moving that bo > > > before moving in other bo, which can be done outside ttm. > > > > > > The only thing that will needs modification to ttm is work done to > > > control memory fragmentation but this should be not be enforce on > > > all ttm user and should be a runtime decision. GPU with virtual > > > address space can scatter bo through vram by using vram pages making > > > memory fragmentation pretty much a non issue (some GPU still needs > > > contiguous memory for scan out buffer or other specific buffer). > > > > > You're correct it COULD be done like that, but that's a nasty workaround. > > Simply assign a priority to each buffer, then modify ttm_bo_add_to_lru, > > ttm_bo_swapout, ttm_mem_evict_first and be done with it. > > > > Memory management is exactly the kind of thing that should be done in TTM, > > so why have something 'generic' for something that's little more than a renamed priority queue? > > The end score and use of the score for placement decision be done in ttm > but the whole score computation and heuristic related to it should not. btw another thing to look at is the eviction roaster in drm_mm. It's completely standalone, the only thing it requires is that you have a deterministic order to add objects to it and unroll them (but that can always be solved by putting objects on a temporary list). That way if you have some big objects and a highly fragmented vram you don't end up eviction a big load of data, but just a perfectly-sized hole. All the scanning is linar, but ime with the implementation in i915.ko that's not a real-world issue really. The drm_mm roaster supports all the same features as the normal block allocator, so range-restricted allocations (and everything else) also works. See evict_something in i915_gem_eviction.c for how it all works (yeah, no docs but writing those for drm_mm.c is on my todo somewhere). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel