Re: [PATCH v5 05/12] drm/ttm: Provide a generic LRU walker helper

Matthew Brost <matthew.brost@xxxxxxxxx> · Wed, 19 Jun 2024 15:09:52 +0000

On Wed, Jun 19, 2024 at 09:31:33AM +0200, Thomas Hellström wrote:
> Hi, Matthew. 
> 
> Thanks for reviewing.
> 
> On Tue, 2024-06-18 at 22:11 +0000, Matthew Brost wrote:
> > On Tue, Jun 18, 2024 at 09:18:13AM +0200, Thomas Hellström wrote:
> > 
> > Replying to correct version...
> > 
> > > Provide a generic LRU walker in TTM, in the spirit of
> > > drm_gem_lru_scan()
> > > but building on the restartable TTM LRU functionality.
> > > 
> > > The LRU walker optionally supports locking objects as part of
> > > a ww mutex locking transaction, to mimic to some extent the
> > > current functionality in ttm. However any -EDEADLK return
> > > is converted to -ENOMEM, so that the driver will need to back
> > > off and possibly retry without being able to keep the
> > > ticket.
> > > 
> > 
> > Wouldn't the backoff be unlock everything but keep the ticket?
> 
> We can't do that (yet) since we don't have the drm_exec or similar
> functionality. The missing part is that if keep the ticket, it's in
> contended state which means we need to slow-lock the contending lock as
> the first lock. And the caller doesn't know which lock is the
> contending one....
> 
> That is all addressed in the RFC part of the series that I left out for
> now. This is only trying to mimic current functionality.
> 

Got it.

> > 
> > > v3:
> > > - Move the helper to core ttm.
> > > - Remove the drm_exec usage from it for now, it will be
> > >   reintroduced later in the series.
> > > v4:
> > > - Handle the -EALREADY case if ticketlocking.
> > > 
> > > Cc: Christian König <christian.koenig@xxxxxxx>
> > > Cc: Somalapuram Amaranath <Amaranath.Somalapuram@xxxxxxx>
> > > Cc: Matthew Brost <matthew.brost@xxxxxxxxx>
> > > Cc: <dri-devel@xxxxxxxxxxxxxxxxxxxxx>
> > > Signed-off-by: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx>
> > > ---
> > >  drivers/gpu/drm/ttm/ttm_bo_util.c | 145
> > > ++++++++++++++++++++++++++++++
> > >  include/drm/ttm/ttm_bo.h          |  32 +++++++
> > >  2 files changed, 177 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > index 0b3f4267130c..45fcaf6f8644 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > @@ -768,3 +768,148 @@ int ttm_bo_pipeline_gutting(struct
> > > ttm_buffer_object *bo)
> > >  	ttm_tt_destroy(bo->bdev, ttm);
> > >  	return ret;
> > >  }
> > > +
> > > +static bool ttm_lru_walk_trylock(struct ttm_lru_walk *walk,
> > > +				 struct ttm_buffer_object *bo,
> > > +				 bool *needs_unlock)
> > > +{
> > > +	struct ttm_operation_ctx *ctx = walk->ctx;
> > > +
> > > +	*needs_unlock = false;
> > > +
> > > +	if (dma_resv_trylock(bo->base.resv)) {
> > > +		*needs_unlock = true;
> > > +		return true;
> > > +	}
> > > +
> > > +	if (bo->base.resv == ctx->resv && ctx->allow_res_evict) {
> > > +		dma_resv_assert_held(bo->base.resv);
> > > +		return true;
> > > +	}
> > > +i
> > 
> > Any reason this is done after the try lock? Just kinda goofy as if
> > this
> > statement is true the dma_resv_trylock will always fail.
> 
> It should work either way. I guess I had viewed it as "trylock first,
> if that fails, attempt any exception". I guess if we want to optimize
> performance for shared lock implementations, moving it first might
> avoid the atomic trylock operation, but I wouldn't expect a noticeable
> difference.
>

Agree it works either way.

> > 
> > > +	return false;
> > > +}
> > > +
> > > +static int ttm_lru_walk_ticketlock(struct ttm_lru_walk *walk,
> > > +				   struct ttm_buffer_object *bo,
> > > +				   bool *needs_unlock)
> > > +{
> > > +	struct dma_resv *resv = bo->base.resv;
> > > +	int ret;
> > > +
> > 
> > I suppose we don't have asserts here like in Xe but if we did,
> > assert(walk->ticket)?
> 
> I agree. I think we'd really want a TTM assert or warning that could be
> compiled away. In any case, I only expect a single caller of this
> function.
> 
> > 
> > > +	if (walk->ctx->interruptible)
> > > +		ret = dma_resv_lock_interruptible(resv, walk-
> > > >ticket);
> > > +	else
> > > +		ret = dma_resv_lock(resv, walk->ticket);
> > > +
> > > +	if (!ret) {
> > > +		*needs_unlock = true;
> > > +		/* Only a single ticketlock per loop. */
> > > +		walk->ticket = NULL;
> > 
> > Can you explain this a bit more? I see that once the walk->ticket is
> > set
> > to NULL this function will not be called again (i.e. only try locking
> > will be used). I want to understand the reasoning for this.
> > 
> > It might be helpful for a more lengthly explaination in the comments
> > of
> > the code too.
> 
> I can add a more thorough explanation, Again, this is trying to mimic
> the current code, that does a walk of trylocking, then a single ticket
> lock more as a sort of "wait a while to be able to make progress" and
> then resorts to trylock again.
> 
> The real reason to avoid multiple ticketlocks here is that each have a
> chance of failing with -EDEADLK, (in particular with the -EDEADLK
> injection enabled) which would translate to an -ENOMEM in the caller.
> 

Makes sense.

> > 
> > > +	} else if (ret == -EDEADLK) {
> > > +		/* Caller needs to exit the ww transaction. */
> > > +		ret = -ENOSPC;
> > 
> > The commit message says -ENOMEM.
> 
> The -ENOSPC is converted to -ENOMEM for the driver in
> ttm_bo_validate(). I could explain a bit more in the commit message.
>

Forgot about that.

> > 
> > > +	}
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +static void ttm_lru_walk_unlock(struct ttm_buffer_object *bo, bool
> > > locked)
> > > +{
> > > +	if (locked)
> > > +		dma_resv_unlock(bo->base.resv);
> > > +}
> > > +
> > > +/**
> > > + * ttm_lru_walk_for_evict() - Perform a LRU list walk, with
> > > actions taken on
> > > + * valid items.
> > > + * @walk: describe the walks and actions taken
> > > + * @bdev: The TTM device.
> > > + * @man: The struct ttm_resource manager whose LRU lists we're
> > > walking.
> > > + * @target: The end condition for the walk.
> > > + *
> > > + * The LRU lists of @man are walk, and for each struct
> > > ttm_resource encountered,
> > > + * the corresponding ttm_buffer_object is locked and taken a
> > > reference on, and
> > > + * the LRU lock is dropped. the LRU lock may be dropped before
> > > locking and, in
> > > + * that case, it's verified that the item actually remains on the
> > > LRU list after
> > > + * the lock, and that the buffer object didn't switch resource in
> > > between.
> > > + *
> > > + * With a locked object, the actions indicated by @walk-
> > > >process_bo are
> > > + * performed, and after that, the bo is unlocked, the refcount
> > > dropped and the
> > > + * next struct ttm_resource is processed. Here, the walker relies
> > > on
> > > + * TTM's restartable LRU list implementation.
> > > + *
> > > + * Typically @walk->process_bo() would return the number of pages
> > > evicted,
> > > + * swapped or shrunken, so that when the total exceeds @target, or
> > > when the
> > > + * LRU list has been walked in full, iteration is terminated. It's
> > > also terminated
> > > + * on error. Note that the definition of @target is done by the
> > > caller, it
> > > + * could have a different meaning than the number of pages.
> > > + *
> > > + * Note that the way dma_resv individualization is done, locking
> > > needs to be done
> > > + * either with the LRU lock held (trylocking only) or with a
> > > reference on the
> > > + * object.
> > > + *
> > > + * Return: The progress made towards target or negative error code
> > > on error.
> > > + */
> > > +long ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct
> > > ttm_device *bdev,
> > > +			    struct ttm_resource_manager *man, long
> > > target)
> > > +{
> > > +	struct ttm_resource_cursor cursor;
> > > +	struct ttm_resource *res;
> > > +	long sofar = 0;
> > 
> > s/sofar/evicted?
> 
> That's not always the case. When used for eviction it is actually 0 if
> the new allocation failed, 1 on success. It should be interpreted as
> "progress towards target", so perhaps progress?
> 

progress sounds good.

> > 
> > > +	long lret;
> > > +
> > > +	spin_lock(&bdev->lru_lock);
> > > +	ttm_resource_manager_for_each_res(man, &cursor, res) {
> > > +		struct ttm_buffer_object *bo = res->bo;
> > > +		bool bo_needs_unlock = false;
> > > +		bool bo_locked = false;
> > > +		int mem_type;
> > > +
> > > +		if (!bo || bo->resource != res)
> > > +			continue;
> > > +
> > > +		if (ttm_lru_walk_trylock(walk, bo,
> > > &bo_needs_unlock))
> > > +			bo_locked = true;
> > > +		else if ((!walk->ticket) || walk->ctx->no_wait_gpu
> > > ||
> > 
> > Nit - (!walk->ticket) could just be !walk->ticket.
> 
> Will fix.
> 
> > 
> > > +			 walk->trylock_only)
> > > +			continue;
> > > +
> > > +		if (!ttm_bo_get_unless_zero(bo)) {
> > > +			ttm_lru_walk_unlock(bo, bo_needs_unlock);
> > > +			continue;
> > > +		}
> > > +
> > 
> > This kinda goofy pattern too, typically in code a get_unless_zero is
> > done before trying to lock the object not after. Even odder here, the
> > could or could not be locked depending on the outcome of
> > ttm_lru_walk_trylock. This is covering individualization case? Would
> > it
> > make more sense to move ttm_bo_get_unless_zero before the try lock or
> > is
> > that to avoid a put on try lock failure + continue?
> 
> I guess this is still a remnant from the old code: ttm_bos can't be put
> in atomic context, so we'd had to unlock the lru_lock to put ( which we
> still do). However, without the restartable lists that we have with
> this series, list traversal would have to be reset.
> 
> If we were to change the order of trylock and get_unless_zero now, we
> could do that, but that would mean unnecessary refcounting and dropping
> the lru_lock in the case of trylock failure.
> 

Missed the putting in atomic context not allowed. Seems fine as is.

> 
> > 
> > > +		mem_type = res->mem_type;
> > > +		spin_unlock(&bdev->lru_lock);
> > > +
> > > +		lret = 0;
> > > +		if (!bo_locked && walk->ticket)
> > 
> > As above could you explain the ticket usage a bit more?
> > 
> > Also you shouldn't need to check the ticket here there is !walk-
> > >ticket
> > above which triggers a continue.
> 
> I hope I explained sufficiently above. Please get back otherwise, I'll
> remove the walk->ticket check.
> 

You did explain sufficiently.

> > 
> > > +			lret = ttm_lru_walk_ticketlock(walk, bo,
> > > &bo_needs_unlock);
> > > +
> > > +		/*
> > > +		 * Note that in between the release of the lru
> > > lock and the
> > > +		 * ticketlock, the bo may have switched resource,
> > > +		 * and also memory type, since the resource may
> > > have been
> > > +		 * freed and allocated again with a different
> > > memory type.
> > > +		 * In that case, just skip it.
> > > +		 */
> > > +		if (!lret && bo->resource == res && res->mem_type
> > > == mem_type)
> > > +			lret = walk->ops->process_bo(walk, bo);
> > > +
> > > +		ttm_lru_walk_unlock(bo, bo_needs_unlock);
> > > +		ttm_bo_put(bo);
> > > +		if (lret == -EBUSY || lret == -EALREADY)
> > > +			lret = 0;
> > 
> > What is usage of these error codes?
> > 
> > -EALREADY means the resv is locked with the current ticket, right?
> > Wouldn't we want to call process_bo in this case too?
> 
> No, then we might evict our own working set. Processing of shared bos
> is handled as before with ctx->allow_res_evict.
>

Ah, yes. Makes sense.

> > 
> > -EBUSY I need some help figuring out.
> 
> -EBUSY is the return of ttm_bo_wait_ctx() in case we're in no-wait-gpu
> context. Like direct reclaim. If the bo is then busy, we simply skip to
> the next bo. If process_bo() waits for gpu idle using the above
> function, this is to catch that error code.
> Should probably add that to the process_bo() docs.
>
> > 
> > > +		sofar = (lret < 0) ? lret : sofar + lret;
> > > +		if (sofar < 0 || sofar >= target)
> > > +			goto out;
> > > +
> > 
> > Here we have dropped the BO unlock. What prevents the BO from being
> > moved back to the resource we just evicted it from resulting in sofar
> > not being accurate?
> 
> For eviction, the byte-count is not used, but rather the success of
> allocating the new resource.
> 
> For shrinking, any process could allocate what we just shrunk, and even
> so shrinking does not always mean that the memory is immediately
> available so the byte-count will always be a rough estimate.
> 
> A related question is "what prevents anyone else from stealing memory
> that we just evicted", and the answer to that is "Nothing... yet". The
> exhaustive eviction part handles that with drm_exec keeping the bo
> locks of evicted objects until we make progress. Eventually we'd hold
> enough locks to block all other competitors.
> 
> Unrelated to this patch, me and Christian had a discussion on whether
> to unlock those when we made progress in the sense of
> 
> a) A validation succeeded,
> b) An exec submission succeeded.
> 

Thanks all of the explainations, makes more sense.

Patch LGTM aside from a couple of nits, with that:
Reviewed-by: Matthew Brost <matthew.brost@xxxxxxxxx>

> /Thomas
> 
> > 
> > Matt
> > 
> > > +		cond_resched();
> > > +		spin_lock(&bdev->lru_lock);
> > > +	}
> > > +	spin_unlock(&bdev->lru_lock);
> > > +out:
> > > +	ttm_resource_cursor_fini(&cursor);
> > > +	return sofar;
> > > +}
> > > diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> > > index 6ccf96c91f3a..8b032298d66e 100644
> > > --- a/include/drm/ttm/ttm_bo.h
> > > +++ b/include/drm/ttm/ttm_bo.h
> > > @@ -190,6 +190,38 @@ struct ttm_operation_ctx {
> > >  	uint64_t bytes_moved;
> > >  };
> > >  
> > > +struct ttm_lru_walk;
> > > +
> > > +/** struct ttm_lru_walk_ops - Operations for a LRU walk. */
> > > +struct ttm_lru_walk_ops {
> > > +	/**
> > > +	 * process_bo - Process this bo.
> > > +	 * @walk: struct ttm_lru_walk describing the walk.
> > > +	 * @bo: A locked and referenced buffer object.
> > > +	 *
> > > +	 * Return: Negative error code on error, Number of
> > > processed pages on
> > > +	 * success. 0 also indicates success.
> > > +	 */
> > > +	long (*process_bo)(struct ttm_lru_walk *walk, struct
> > > ttm_buffer_object *bo);
> > > +};
> > > +
> > > +/**
> > > + * struct ttm_lru_walk - Structure describing a LRU walk.
> > > + */
> > > +struct ttm_lru_walk {
> > > +	/** @ops: Pointer to the ops structure. */
> > > +	const struct ttm_lru_walk_ops *ops;
> > > +	/** @ctx: Pointer to the struct ttm_operation_ctx. */
> > > +	struct ttm_operation_ctx *ctx;
> > > +	/** @ticket: The struct ww_acquire_ctx if any. */
> > > +	struct ww_acquire_ctx *ticket;
> > > +	/** @tryock_only: Only use trylock for locking. */
> > > +	bool trylock_only;
> > > +};
> > > +
> > > +long ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct
> > > ttm_device *bdev,
> > > +			    struct ttm_resource_manager *man, long
> > > target);
> > > +
> > >  /**
> > >   * ttm_bo_get - reference a struct ttm_buffer_object
> > >   *
> > > -- 
> > > 2.44.0
> > > 
>