Re: [PATCH 4/7] drm/ttm: move LRU walk defines into new internal header

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Mon, 2024-08-19 at 17:26 +0200, Christian König wrote:
> Am 19.08.24 um 16:14 schrieb Daniel Vetter:
> > On Mon, Aug 19, 2024 at 01:38:56PM +0200, Thomas Hellström wrote:
> > > Hi, Christian,
> > > 
> > > On Mon, 2024-08-19 at 13:03 +0200, Christian König wrote:
> > > > Am 06.08.24 um 10:29 schrieb Thomas Hellström:
> > > > > Hi, Christian.
> > > > > 
> > > > > On Thu, 2024-07-11 at 14:01 +0200, Christian König wrote:
> > > > > > Am 10.07.24 um 20:19 schrieb Matthew Brost:
> > > > > > > On Wed, Jul 10, 2024 at 02:42:58PM +0200, Christian König
> > > > > > > wrote:
> > > > > > > > That is something drivers really shouldn't mess with.
> > > > > > > > 
> > > > > > > Thomas uses this in Xe to implement a shrinker [1]. Seems
> > > > > > > to
> > > > > > > need
> > > > > > > to
> > > > > > > remain available for drivers.
> > > > > > No, that is exactly what I try to prevent with that.
> > > > > > 
> > > > > > This is an internally thing of TTM and drivers should never
> > > > > > use
> > > > > > it
> > > > > > directly.
> > > > > That driver-facing LRU walker is a direct response/solution
> > > > > to this
> > > > > comment that you made in the first shrinker series:
> > > > > 
> > > > > https://lore.kernel.org/linux-mm/b7491378-defd-4f1c-31e2-29e4c77e2d67@xxxxxxx/T/#ma918844aa8a6efe8768fdcda0c6590d5c93850c9
> > > > Ah, yeah that was about how we are should be avoiding middle
> > > > layer
> > > > design.
> > > > 
> > > > But a function which returns the next candidate for eviction
> > > > and a
> > > > function which calls a callback for eviction is exactly the
> > > > opposite.
> > > > 
> > > > > That is also mentioned in the cover letter of the recent
> > > > > shrinker
> > > > > series, and this walker has been around in that shrinker
> > > > > series for
> > > > > more than half a year now so if you think it's not the
> > > > > correct
> > > > > driver
> > > > > facing API IMHO that should be addressed by a review comment
> > > > > in
> > > > > that
> > > > > series rather than in posting a conflicting patch?
> > > > I actually outlined that in the review comments for the patch
> > > > series.
> > > > E.g. a walker function with a callback is basically a middle
> > > > layer.
> > > > 
> > > > What outlined in the link above is that a function which
> > > > returns the
> > > > next eviction candidate is a better approach than a callback.
> > > > 
> > > > > So assuming that we still want the driver to register the
> > > > > shrinker,
> > > > > IMO that helper abstracts away all the nasty locking and
> > > > > pitfalls
> > > > > for a
> > > > > driver-registered shrinker, and is similar in structure to
> > > > > for
> > > > > example
> > > > > the pagewalk helper in mm/pagewalk.c.
> > > > > 
> > > > > An alternative that could be tried as a driver-facing API is
> > > > > to
> > > > > provide
> > > > > a for_each_bo_in_lru_lock() macro where the driver open-codes
> > > > > "process_bo()" inside the for loop but I tried this and found
> > > > > it
> > > > > quite
> > > > > fragile since the driver might exit the loop without
> > > > > performing the
> > > > > necessary cleanup.
> > > > The point is that the shrinker should *never* need to have
> > > > context.
> > > > E.g.
> > > > a walker which allows going over multiple BOs for eviction is
> > > > exactly
> > > > the wrong approach for that.
> > > > 
> > > > The shrinker should evict always only exactly one BO and the
> > > > next
> > > > invocation of a shrinker should not depend on the result of the
> > > > previous
> > > > one.
> > > > 
> > > > Or am I missing something vital here?
> > > A couple of things,
> > > 
> > > 1) I'd like to think of the middle-layer vs helper like the
> > > helper has
> > > its own ops, and can be used optionally from the driver. IIRC,
> > > the
> > > atomic modesetting / pageflip ops are implemented in exactly this
> > > way.
> > > 
> > > Sometimes a certain loop operation can't be easily or at least
> > > robustly
> > > implemented using a for_each_.. approach. Like for example
> > > mm/pagewalk.c. In this shrinking case I think it's probably
> > > possible
> > > using the scoped_guard() in cleanup.h. This way we could get rid
> > > of
> > > this middle layer discussion by turning the interface inside-out:
> > > 
> > > for_each_bo_on_lru_locked(xxx)
> > > 	driver_shrink();
> > > 
> > > But I do think the currently suggested approach is less fragile
> > > and
> > > prone to driver error.
> > > 
> > > FWIW in addition to the above examples, also drm_gem_lru_scan
> > > works
> > > like this.
> > a iteration state structure (like drm_connector_iter) plus then a
> > macro
> > for the common case that uses cleanup.h should get the job done.
> 
> Yeah, completely agree. It basically boils down to which one needs to
> pack a state bag.
> 
> In a mid-layer design it's the driver or the caller of functions,
> e.g. 
> the much hated init callback in the DRM layer was a perfect example
> of that.
> 
> In a non mid-layer approach it's the framework or the called
> function, 
> examples are how the fence iterators in the dma_resv or the 
> drm_connector, plane etc.. work.
> 
> One big difference between the two approach is who and where things
> like 
> preparation and cleanup are done, e.g. who takes locks for example.

So what in your opinion is an acceptable solution here? 
The "get next object to shrink" approach won't work, since it's subject
to the old TTM swap problem now removed:
If shrinking fails we will get the same object upon subsequent calls.
If we bump LRU we could end up with infinite loops.
So IMO we need to be able to loop. I don't really care wether we do
this as an explicit loop or whether we use the LRU walker, but I think
from a maintainability point-of-view it is better to keep LRU walking
in a single place.

If we return an unlocked object, we'd need to refcount and drop the lru
lock, but maybe that's not a bad thing.

But what's the main drawback of exporting the existing helper.

> 
> > > 2) The shrinkers suggest to shrink a number of pages, not a
> > > single bo,
> > > again drm_gem_lru_scan works like this. i915 works like this. I
> > > think
> > > we should align with those.
> > Yeah that's how shrinkers work, so if we demidlayer then it realls
> > needs
> > to be a loop in the driver, not a "here's the next bo to nuke" I
> > think.
> 
> Hui? Well that's not how I understand shrinkers.
> 
> The shrinker gives you the maximum number of objects to scan and not
> how 
> many pages to free. In return you give the actual number of objects
> to 
> scanned and pages freed.
> 
> As far as I know the number of objects are in the sense of SLUBs or 
> rather different LRU lists.
> 
> So for BO shrinking we should probably only free or rather unpin a 
> single BO per callback otherwise we mess up the fairness between 
> shrinkers in the MM layer.

Hm. AFAICT all drm shrinkers use pages as objects, and the docs of
nr_to_scan says it's the number of objects to scan and try to reclaim.
I think this strategy has had a fair amount of testing with the i915
driver.
In any case, I don't think TTM should enforce a different way of
shrinking by the means of a severely restricted helper?

/Thomas





[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux