Re: [PATCH 4/7] drm/ttm: move LRU walk defines into new internal header

Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> · Wed, 28 Aug 2024 17:45:32 +0200

On Wed, 2024-08-28 at 17:25 +0200, Christian König wrote:
> Am 28.08.24 um 16:05 schrieb Thomas Hellström:
> > On Wed, 2024-08-28 at 14:20 +0200, Christian König wrote:
> > > Am 27.08.24 um 19:53 schrieb Daniel Vetter:
> > > > On Tue, Aug 27, 2024 at 06:52:13PM +0200, Daniel Vetter wrote:
> > > > > On Thu, Aug 22, 2024 at 03:19:29PM +0200, Christian König
> > > > > wrote:
> > > > > > Completely agree that this is complicated, but I still
> > > > > > don't
> > > > > > see the need
> > > > > > for it.
> > > > > > 
> > > > > > Drivers just need to use pm_runtime_get_if_in_use() inside
> > > > > > the
> > > > > > shrinker and
> > > > > > postpone all hw activity until resume.
> > > > > Not good enough, at least long term I think. Also postponing
> > > > > hw
> > > > > activity
> > > > > to resume doesn't solve the deadlock issue, if you still need
> > > > > to
> > > > > grab ttm
> > > > > locks on resume.
> > > > Pondered this specific aspect some more, and I think you still
> > > > have
> > > > a race
> > > > here (even if you avoid the deadlock): If the condiditional
> > > > rpm_get
> > > > call
> > > > fails there's no guarantee that the device will suspend/resume
> > > > and
> > > > clean
> > > > up the GART mapping.
> > > Well I think we have a major disconnect here. When the device is
> > > powered
> > > down there is no GART mapping to clean up any more.
> > > 
> > > In other words GART is a table in local memory (VRAM) when the
> > > device
> > > is
> > > powered down this table is completely destroyed. Any BO which was
> > > mapped
> > > inside this table is now not mapped any more.
> > > 
> > > So when the shrinker wants to evict a BO which is marked as
> > > mapped to
> > > GART and the device is powered down we just skip the GART
> > > unmapping
> > > part
> > > because that has already implicitly happened during power down.
> > > 
> > > Before mapping any BO into the GART again we power the GPU up
> > > through
> > > the runtime PM calls. And while powering it up again the GART is
> > > restored.
> > I think you're forgetting the main Xe use-case of Lunar-lake
> > compression metadata. I'ts retained by the device during D3hot, but
> > cannot, at that time, be accessed for shrinking.
> 
> Yeah, that is really something we don't have an equivalent for on AMD
> GPUs.
> 
> When the ASIC is powered down VRAM is basically dead as well because
> it 
> won't get refreshed any more.
> 
> > And copying it all out "Just in case" when transitioning to D3hot
> > just
> > isn't a viable solution.
> 
> I would say that this is solvable with a hierarchy of power
> management 
> functionality.
> 
> E.g. the runtime PM interface works the same for you as it does for 
> amdgpu with evicting TTM BOs etc....
> 
> Then separate from runtime PM you have a reference count for the 
> accessibility of compressed metadata. And while shrinking you only 
> resume this specific part.

Well this looks like exactly what have with current hardware, or rather
*current* hardware that needs to take the bo lock during runtime pm
doesn't need this type of metadata access, which is currently Lunar-
lake only.

So from a runtime pm deadlock point-of-view we're safe. But then we
need to blit out the metadata and we get a fence to wait for, which
we've been told to avoid in direct reclaim but rather is preferred from
kswapd.

And I must admit I still don't get what the problem is with allowing a
driver LRU loop? The fact that some AMD eviction_valuable()
implementations made bad decisions by far doesn't weigh up for  the
loss of flexibility IMO.

/Thomas

> 
> Christian.
> 
> > 
> > /Thomas
> > 
>