On Wed, Jun 19, 2024 at 09:31:33AM +0200, Thomas Hellström wrote: > Hi, Matthew. > > Thanks for reviewing. > > On Tue, 2024-06-18 at 22:11 +0000, Matthew Brost wrote: > > On Tue, Jun 18, 2024 at 09:18:13AM +0200, Thomas Hellström wrote: > > > > Replying to correct version... > > > > > Provide a generic LRU walker in TTM, in the spirit of > > > drm_gem_lru_scan() > > > but building on the restartable TTM LRU functionality. > > > > > > The LRU walker optionally supports locking objects as part of > > > a ww mutex locking transaction, to mimic to some extent the > > > current functionality in ttm. However any -EDEADLK return > > > is converted to -ENOMEM, so that the driver will need to back > > > off and possibly retry without being able to keep the > > > ticket. > > > > > > > Wouldn't the backoff be unlock everything but keep the ticket? > > We can't do that (yet) since we don't have the drm_exec or similar > functionality. The missing part is that if keep the ticket, it's in > contended state which means we need to slow-lock the contending lock as > the first lock. And the caller doesn't know which lock is the > contending one.... > > That is all addressed in the RFC part of the series that I left out for > now. This is only trying to mimic current functionality. > Got it. > > > > > v3: > > > - Move the helper to core ttm. > > > - Remove the drm_exec usage from it for now, it will be > > > reintroduced later in the series. > > > v4: > > > - Handle the -EALREADY case if ticketlocking. > > > > > > Cc: Christian König <christian.koenig@xxxxxxx> > > > Cc: Somalapuram Amaranath <Amaranath.Somalapuram@xxxxxxx> > > > Cc: Matthew Brost <matthew.brost@xxxxxxxxx> > > > Cc: <dri-devel@xxxxxxxxxxxxxxxxxxxxx> > > > Signed-off-by: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> > > > --- > > > drivers/gpu/drm/ttm/ttm_bo_util.c | 145 > > > ++++++++++++++++++++++++++++++ > > > include/drm/ttm/ttm_bo.h | 32 +++++++ > > > 2 files changed, 177 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c > > > b/drivers/gpu/drm/ttm/ttm_bo_util.c > > > index 0b3f4267130c..45fcaf6f8644 100644 > > > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c > > > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c > > > @@ -768,3 +768,148 @@ int ttm_bo_pipeline_gutting(struct > > > ttm_buffer_object *bo) > > > ttm_tt_destroy(bo->bdev, ttm); > > > return ret; > > > } > > > + > > > +static bool ttm_lru_walk_trylock(struct ttm_lru_walk *walk, > > > + struct ttm_buffer_object *bo, > > > + bool *needs_unlock) > > > +{ > > > + struct ttm_operation_ctx *ctx = walk->ctx; > > > + > > > + *needs_unlock = false; > > > + > > > + if (dma_resv_trylock(bo->base.resv)) { > > > + *needs_unlock = true; > > > + return true; > > > + } > > > + > > > + if (bo->base.resv == ctx->resv && ctx->allow_res_evict) { > > > + dma_resv_assert_held(bo->base.resv); > > > + return true; > > > + } > > > +i > > > > Any reason this is done after the try lock? Just kinda goofy as if > > this > > statement is true the dma_resv_trylock will always fail. > > It should work either way. I guess I had viewed it as "trylock first, > if that fails, attempt any exception". I guess if we want to optimize > performance for shared lock implementations, moving it first might > avoid the atomic trylock operation, but I wouldn't expect a noticeable > difference. > Agree it works either way. > > > > > + return false; > > > +} > > > + > > > +static int ttm_lru_walk_ticketlock(struct ttm_lru_walk *walk, > > > + struct ttm_buffer_object *bo, > > > + bool *needs_unlock) > > > +{ > > > + struct dma_resv *resv = bo->base.resv; > > > + int ret; > > > + > > > > I suppose we don't have asserts here like in Xe but if we did, > > assert(walk->ticket)? > > I agree. I think we'd really want a TTM assert or warning that could be > compiled away. In any case, I only expect a single caller of this > function. > > > > > > + if (walk->ctx->interruptible) > > > + ret = dma_resv_lock_interruptible(resv, walk- > > > >ticket); > > > + else > > > + ret = dma_resv_lock(resv, walk->ticket); > > > + > > > + if (!ret) { > > > + *needs_unlock = true; > > > + /* Only a single ticketlock per loop. */ > > > + walk->ticket = NULL; > > > > Can you explain this a bit more? I see that once the walk->ticket is > > set > > to NULL this function will not be called again (i.e. only try locking > > will be used). I want to understand the reasoning for this. > > > > It might be helpful for a more lengthly explaination in the comments > > of > > the code too. > > I can add a more thorough explanation, Again, this is trying to mimic > the current code, that does a walk of trylocking, then a single ticket > lock more as a sort of "wait a while to be able to make progress" and > then resorts to trylock again. > > The real reason to avoid multiple ticketlocks here is that each have a > chance of failing with -EDEADLK, (in particular with the -EDEADLK > injection enabled) which would translate to an -ENOMEM in the caller. > Makes sense. > > > > > + } else if (ret == -EDEADLK) { > > > + /* Caller needs to exit the ww transaction. */ > > > + ret = -ENOSPC; > > > > The commit message says -ENOMEM. > > The -ENOSPC is converted to -ENOMEM for the driver in > ttm_bo_validate(). I could explain a bit more in the commit message. > Forgot about that. > > > > > + } > > > + > > > + return ret; > > > +} > > > + > > > +static void ttm_lru_walk_unlock(struct ttm_buffer_object *bo, bool > > > locked) > > > +{ > > > + if (locked) > > > + dma_resv_unlock(bo->base.resv); > > > +} > > > + > > > +/** > > > + * ttm_lru_walk_for_evict() - Perform a LRU list walk, with > > > actions taken on > > > + * valid items. > > > + * @walk: describe the walks and actions taken > > > + * @bdev: The TTM device. > > > + * @man: The struct ttm_resource manager whose LRU lists we're > > > walking. > > > + * @target: The end condition for the walk. > > > + * > > > + * The LRU lists of @man are walk, and for each struct > > > ttm_resource encountered, > > > + * the corresponding ttm_buffer_object is locked and taken a > > > reference on, and > > > + * the LRU lock is dropped. the LRU lock may be dropped before > > > locking and, in > > > + * that case, it's verified that the item actually remains on the > > > LRU list after > > > + * the lock, and that the buffer object didn't switch resource in > > > between. > > > + * > > > + * With a locked object, the actions indicated by @walk- > > > >process_bo are > > > + * performed, and after that, the bo is unlocked, the refcount > > > dropped and the > > > + * next struct ttm_resource is processed. Here, the walker relies > > > on > > > + * TTM's restartable LRU list implementation. > > > + * > > > + * Typically @walk->process_bo() would return the number of pages > > > evicted, > > > + * swapped or shrunken, so that when the total exceeds @target, or > > > when the > > > + * LRU list has been walked in full, iteration is terminated. It's > > > also terminated > > > + * on error. Note that the definition of @target is done by the > > > caller, it > > > + * could have a different meaning than the number of pages. > > > + * > > > + * Note that the way dma_resv individualization is done, locking > > > needs to be done > > > + * either with the LRU lock held (trylocking only) or with a > > > reference on the > > > + * object. > > > + * > > > + * Return: The progress made towards target or negative error code > > > on error. > > > + */ > > > +long ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct > > > ttm_device *bdev, > > > + struct ttm_resource_manager *man, long > > > target) > > > +{ > > > + struct ttm_resource_cursor cursor; > > > + struct ttm_resource *res; > > > + long sofar = 0; > > > > s/sofar/evicted? > > That's not always the case. When used for eviction it is actually 0 if > the new allocation failed, 1 on success. It should be interpreted as > "progress towards target", so perhaps progress? > progress sounds good. > > > > > + long lret; > > > + > > > + spin_lock(&bdev->lru_lock); > > > + ttm_resource_manager_for_each_res(man, &cursor, res) { > > > + struct ttm_buffer_object *bo = res->bo; > > > + bool bo_needs_unlock = false; > > > + bool bo_locked = false; > > > + int mem_type; > > > + > > > + if (!bo || bo->resource != res) > > > + continue; > > > + > > > + if (ttm_lru_walk_trylock(walk, bo, > > > &bo_needs_unlock)) > > > + bo_locked = true; > > > + else if ((!walk->ticket) || walk->ctx->no_wait_gpu > > > || > > > > Nit - (!walk->ticket) could just be !walk->ticket. > > Will fix. > > > > > > + walk->trylock_only) > > > + continue; > > > + > > > + if (!ttm_bo_get_unless_zero(bo)) { > > > + ttm_lru_walk_unlock(bo, bo_needs_unlock); > > > + continue; > > > + } > > > + > > > > This kinda goofy pattern too, typically in code a get_unless_zero is > > done before trying to lock the object not after. Even odder here, the > > could or could not be locked depending on the outcome of > > ttm_lru_walk_trylock. This is covering individualization case? Would > > it > > make more sense to move ttm_bo_get_unless_zero before the try lock or > > is > > that to avoid a put on try lock failure + continue? > > I guess this is still a remnant from the old code: ttm_bos can't be put > in atomic context, so we'd had to unlock the lru_lock to put ( which we > still do). However, without the restartable lists that we have with > this series, list traversal would have to be reset. > > If we were to change the order of trylock and get_unless_zero now, we > could do that, but that would mean unnecessary refcounting and dropping > the lru_lock in the case of trylock failure. > Missed the putting in atomic context not allowed. Seems fine as is. > > > > > > + mem_type = res->mem_type; > > > + spin_unlock(&bdev->lru_lock); > > > + > > > + lret = 0; > > > + if (!bo_locked && walk->ticket) > > > > As above could you explain the ticket usage a bit more? > > > > Also you shouldn't need to check the ticket here there is !walk- > > >ticket > > above which triggers a continue. > > I hope I explained sufficiently above. Please get back otherwise, I'll > remove the walk->ticket check. > You did explain sufficiently. > > > > > + lret = ttm_lru_walk_ticketlock(walk, bo, > > > &bo_needs_unlock); > > > + > > > + /* > > > + * Note that in between the release of the lru > > > lock and the > > > + * ticketlock, the bo may have switched resource, > > > + * and also memory type, since the resource may > > > have been > > > + * freed and allocated again with a different > > > memory type. > > > + * In that case, just skip it. > > > + */ > > > + if (!lret && bo->resource == res && res->mem_type > > > == mem_type) > > > + lret = walk->ops->process_bo(walk, bo); > > > + > > > + ttm_lru_walk_unlock(bo, bo_needs_unlock); > > > + ttm_bo_put(bo); > > > + if (lret == -EBUSY || lret == -EALREADY) > > > + lret = 0; > > > > What is usage of these error codes? > > > > -EALREADY means the resv is locked with the current ticket, right? > > Wouldn't we want to call process_bo in this case too? > > No, then we might evict our own working set. Processing of shared bos > is handled as before with ctx->allow_res_evict. > Ah, yes. Makes sense. > > > > -EBUSY I need some help figuring out. > > -EBUSY is the return of ttm_bo_wait_ctx() in case we're in no-wait-gpu > context. Like direct reclaim. If the bo is then busy, we simply skip to > the next bo. If process_bo() waits for gpu idle using the above > function, this is to catch that error code. > Should probably add that to the process_bo() docs. > > > > > > + sofar = (lret < 0) ? lret : sofar + lret; > > > + if (sofar < 0 || sofar >= target) > > > + goto out; > > > + > > > > Here we have dropped the BO unlock. What prevents the BO from being > > moved back to the resource we just evicted it from resulting in sofar > > not being accurate? > > For eviction, the byte-count is not used, but rather the success of > allocating the new resource. > > For shrinking, any process could allocate what we just shrunk, and even > so shrinking does not always mean that the memory is immediately > available so the byte-count will always be a rough estimate. > > A related question is "what prevents anyone else from stealing memory > that we just evicted", and the answer to that is "Nothing... yet". The > exhaustive eviction part handles that with drm_exec keeping the bo > locks of evicted objects until we make progress. Eventually we'd hold > enough locks to block all other competitors. > > Unrelated to this patch, me and Christian had a discussion on whether > to unlock those when we made progress in the sense of > > a) A validation succeeded, > b) An exec submission succeeded. > Thanks all of the explainations, makes more sense. Patch LGTM aside from a couple of nits, with that: Reviewed-by: Matthew Brost <matthew.brost@xxxxxxxxx> > /Thomas > > > > > Matt > > > > > + cond_resched(); > > > + spin_lock(&bdev->lru_lock); > > > + } > > > + spin_unlock(&bdev->lru_lock); > > > +out: > > > + ttm_resource_cursor_fini(&cursor); > > > + return sofar; > > > +} > > > diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h > > > index 6ccf96c91f3a..8b032298d66e 100644 > > > --- a/include/drm/ttm/ttm_bo.h > > > +++ b/include/drm/ttm/ttm_bo.h > > > @@ -190,6 +190,38 @@ struct ttm_operation_ctx { > > > uint64_t bytes_moved; > > > }; > > > > > > +struct ttm_lru_walk; > > > + > > > +/** struct ttm_lru_walk_ops - Operations for a LRU walk. */ > > > +struct ttm_lru_walk_ops { > > > + /** > > > + * process_bo - Process this bo. > > > + * @walk: struct ttm_lru_walk describing the walk. > > > + * @bo: A locked and referenced buffer object. > > > + * > > > + * Return: Negative error code on error, Number of > > > processed pages on > > > + * success. 0 also indicates success. > > > + */ > > > + long (*process_bo)(struct ttm_lru_walk *walk, struct > > > ttm_buffer_object *bo); > > > +}; > > > + > > > +/** > > > + * struct ttm_lru_walk - Structure describing a LRU walk. > > > + */ > > > +struct ttm_lru_walk { > > > + /** @ops: Pointer to the ops structure. */ > > > + const struct ttm_lru_walk_ops *ops; > > > + /** @ctx: Pointer to the struct ttm_operation_ctx. */ > > > + struct ttm_operation_ctx *ctx; > > > + /** @ticket: The struct ww_acquire_ctx if any. */ > > > + struct ww_acquire_ctx *ticket; > > > + /** @tryock_only: Only use trylock for locking. */ > > > + bool trylock_only; > > > +}; > > > + > > > +long ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct > > > ttm_device *bdev, > > > + struct ttm_resource_manager *man, long > > > target); > > > + > > > /** > > > * ttm_bo_get - reference a struct ttm_buffer_object > > > * > > > -- > > > 2.44.0 > > > >