On Fri, 2024-02-16 at 15:00 +0100, Christian König wrote: > Am 16.02.24 um 14:13 schrieb Thomas Hellström: > > This patch-set is a prerequisite for a standalone TTM shrinker > > and for exhaustive TTM eviction using sleeping dma_resv locks, > > which is the motivation it. > > > > Currently when unlocking the TTM lru list lock, iteration needs > > to be restarted from the beginning, rather from the next LRU list > > node. This can potentially be a big problem, because if eviction > > or shrinking fails for whatever reason after unlock, restarting > > is likely to cause the same failure over and over again. > > Oh, yes please. I have been working on that problem before as well, > but > wasn't able to come up with something working. > > > There are various schemes to be able to continue the list > > iteration from where we left off. One such scheme used by the > > GEM LRU list traversal is to pull items already considered off > > the LRU list and reinsert them when iteration is done. > > This has the drawback that concurrent list iteration doesn't see > > the complete list (which is bad for exhaustive eviction) and also > > doesn't lend itself well to bulk-move sublists since these will > > be split in the process where items from those lists are > > temporarily pulled from the list and moved to the list tail. > > Completely agree that this is not a desirable solution. > > > The approach taken here is that list iterators insert themselves > > into the list next position using a special list node. Iteration > > is then using that list node as starting point when restarting. > > Concurrent iterators just skip over the special list nodes. > > > > This is implemented in patch 1 and 2. > > > > For bulk move sublist the approach is the same, but when a bulk > > move sublist is moved to the tail, the iterator is also moved, > > causing us to skip parts of the list. That is undesirable. > > Patch 3 deals with that, and when iterator detects it is > > traversing a sublist, it inserts a second restarting point just > > after the sublist and if the sublist is moved to the tail, > > it just uses the second restarting point instead. > > > > This is implemented in patch 3. > > Interesting approach, that is probably even better than what I tried. > > My approach was basically to not only lock the current BO, but also > the > next one. Since only a locked BO can move on the LRU we effectively > created an anchor. > > Before I dig into the code a couple of questions: These are described in the patches but brief comments inline. > 1. How do you distinct BOs and iteration anchors on the LRU? Using a struct ttm_lru_item, containing a struct list_head and the type. List nodes embeds this instead of a struct list_head. This is larger than the list head but makes it explicit what we're doing. > 2. How do you detect that a bulk list moved on the LRU? An age u64 counter on the bulk move that we're comparing against. It's updated each time it moves. > 3. How do you remove the iteration anchors from the bulk list? A function call at the end of iteration, that the function iterating is requried to call. /Thomas > > Regards, > Christian. > > > > > The restartable property is used in patch 4 to restart swapout if > > needed, but the main purpose is this paves the way for > > shrinker- and exhaustive eviction. > > > > Cc: Christian König <christian.koenig@xxxxxxx> > > Cc: <dri-devel@xxxxxxxxxxxxxxxxxxxxx> > > > > Thomas Hellström (4): > > drm/ttm: Allow TTM LRU list nodes of different types > > drm/ttm: Use LRU hitches > > drm/ttm: Consider hitch moves within bulk sublist moves > > drm/ttm: Allow continued swapout after -ENOSPC falure > > > > drivers/gpu/drm/ttm/ttm_bo.c | 1 + > > drivers/gpu/drm/ttm/ttm_device.c | 33 +++-- > > drivers/gpu/drm/ttm/ttm_resource.c | 202 +++++++++++++++++++++++- > > ----- > > include/drm/ttm/ttm_resource.h | 91 +++++++++++-- > > 4 files changed, 267 insertions(+), 60 deletions(-) > > >