Re: [PATCH 4/7] drm/ttm: move LRU walk defines into new internal header

Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> · Wed, 02 Oct 2024 13:30:30 +0200

Hi, Christian,

Ping? Can i get an ack to proceed with this?

Thanks,
Thomas

On Wed, 2024-09-18 at 14:57 +0200, Thomas Hellström wrote:
> Sima, Christian
> 
> I've updated the shrinker series now with a guarded for_each macro
> instead:
> 
> https://patchwork.freedesktop.org/patch/614514/?series=131815&rev=9
> 
> (Note I forgot to remove the export of the previous LRU walker).
> 
>  so the midlayer argument is now not an issue anymore. The cleanup.h
> guard provides some additional protection against drivers exiting the
> LRU loop early.
> 
> So remaining is the question whether the driver is allowed to discard
> a
> suggested bo to shrink from TTM.
> 
> Arguments for:
> 
> 1) Not allowing that would require teaching TTM about purgeable
> objects.
> 2) Devices who need the blitter during shrinking would want to punt
> runtime_pm_get() to kswapd to avoid sleeping direct reclaim.
> 3) If those devices end up blitting (LNL) to be able to shrink, they
> would want to punt waiting for the fence to signal to kswapd to avoid
> waiting in direct reclaim.
> 4) It looks like we need to resort to folio_trylock in the shmem
> backup
> backend when shrinking is called for gfp_t = GFP_NOFS. A failing
> trylock will require a new bo.
> 
> Arguments against:
> None really. I thought the idea of demidlayering would be to allow
> the
> driver more freedom.
> 
> So any feedback appreciated. If that is found acceptable we can
> proceed
> with reviewing this patch and also with the shrinker series.
> 
> Thanks,
> Thomas
> 
> 
> On Mon, 2024-09-02 at 13:07 +0200, Daniel Vetter wrote:
> > On Wed, Aug 28, 2024 at 02:20:34PM +0200, Christian König wrote:
> > > Am 27.08.24 um 19:53 schrieb Daniel Vetter:
> > > > On Tue, Aug 27, 2024 at 06:52:13PM +0200, Daniel Vetter wrote:
> > > > > On Thu, Aug 22, 2024 at 03:19:29PM +0200, Christian König
> > > > > wrote:
> > > > > > Completely agree that this is complicated, but I still
> > > > > > don't
> > > > > > see the need
> > > > > > for it.
> > > > > > 
> > > > > > Drivers just need to use pm_runtime_get_if_in_use() inside
> > > > > > the shrinker and
> > > > > > postpone all hw activity until resume.
> > > > > Not good enough, at least long term I think. Also postponing
> > > > > hw
> > > > > activity
> > > > > to resume doesn't solve the deadlock issue, if you still need
> > > > > to grab ttm
> > > > > locks on resume.
> > > > Pondered this specific aspect some more, and I think you still
> > > > have a race
> > > > here (even if you avoid the deadlock): If the condiditional
> > > > rpm_get call
> > > > fails there's no guarantee that the device will suspend/resume
> > > > and clean
> > > > up the GART mapping.
> > > 
> > > Well I think we have a major disconnect here. When the device is
> > > powered
> > > down there is no GART mapping to clean up any more.
> > > 
> > > In other words GART is a table in local memory (VRAM) when the
> > > device is
> > > powered down this table is completely destroyed. Any BO which was
> > > mapped
> > > inside this table is now not mapped any more.
> > > 
> > > So when the shrinker wants to evict a BO which is marked as
> > > mapped
> > > to GART
> > > and the device is powered down we just skip the GART unmapping
> > > part
> > > because
> > > that has already implicitly happened during power down.
> > > 
> > > Before mapping any BO into the GART again we power the GPU up
> > > through the
> > > runtime PM calls. And while powering it up again the GART is
> > > restored.
> > 
> > My point is that you can't tell whether the device will power down
> > or
> > not,
> > you can only tell whether there's a chance it might be powering
> > down
> > and
> > so you can't get at the rpm reference without deadlock issues.
> > 
> > > > The race gets a bit smaller if you use
> > > > pm_runtime_get_if_active(), but even then you might catch it
> > > > right when
> > > > resume almost finished.
> > > 
> > > What race are you talking about?
> > > 
> > > The worst thing which could happen is that we restore a GART
> > > entry
> > > which
> > > isn't needed any more, but that is pretty much irrelevant since
> > > we
> > > only
> > > clear them to avoid some hw bugs.
> > 
> > The race I'm seeing is where you thought the GART entry is not
> > issue,
> > tossed an object, but the device didn't suspend, so might still use
> > it.
> > 
> > I guess if we're clearly separating the sw allocation of the TTM_TT
> > with
> > the physical entries in the GART that should all work, but feels a
> > bit
> > tricky. The race I've seen is essentially these two getting out of
> > sync.
> > 
> > So maybe it was me who's stuck.
> > 
> > What I wonder is whether it works in practice, since on the restore
> > side
> > you need to get some locks to figure out which gart mappings exist
> > and
> > need restoring. And that's the same locks as the shrinker needs to
> > figure
> > out whether it might need to reap a gart mapping.
> > 
> > Or do you just copy the gart entries over and restore them exactly
> > as-is,
> > so that there's no shared locks?
> > 
> > > > That means we'll have ttm bo hanging around with GART
> > > > allocations/mappings
> > > > which aren't actually valid anymore (since they might escape
> > > > the
> > > > cleanup
> > > > upon resume due to the race). That doesn't feel like a solid
> > > > design
> > > > either.
> > > 
> > > I'm most likely missing something, but I'm really scratching my
> > > head where
> > > you see a problem here.
> > 
> > I guess one issue is that at least traditionally, igfx drivers have
> > nested
> > runtime pm within dma_resv lock. And dgpu drivers the other way
> > round.
> > Which is a bit awkward if you're trying for common code.
> > 
> > Cheers, Sima
>