Re: [PATCH v6 12/12] drm/xe: Increase the XE_PL_TT watermark

Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> · Fri, 09 Aug 2024 14:22:19 +0200

On Wed, 2024-08-07 at 23:13 +0000, Matthew Brost wrote:
> On Mon, Aug 05, 2024 at 12:35:34PM -0600, Souza, Jose wrote:
> > On Wed, 2024-07-03 at 17:38 +0200, Thomas Hellström wrote:
> > > The XE_PL_TT watermark was set to 50% of system memory.
> > > The idea behind that was unclear since the net effect is that
> > > TT memory will be evicted to TTM_PL_SYSTEM memory if that
> > > watermark is exceeded, requiring PPGTT rebinds and dma
> > > remapping. But there is no similar watermark for TTM_PL_SYSTEM
> > > memory.
> > > 
> > > The TTM functionality that tries to swap out system memory to
> > > shmem objects if a 50% limit of total system memory is reached
> > > is orthogonal to this, and with the shrinker added, it's no
> > > longer in effect.
> > > 
> > > Replace the 50% TTM_PL_TT limit with a 100% limit, in effect
> > > allowing all graphics memory to be bound to the device unless it
> > > has been swapped out by the shrinker.
> > 
> > Sorry if I missed some patch changing it but I did not found in
> > this series anything changing the 50% limit in ttm_global_init().
> > When I debugged some Vulkan tests allocate a lot of memory, the
> > reason that KMD was not allocating memory wash this ttm_global
> > limit that is shared
> > with all devices using TTM.
> > 
> 
> I'm reviewing this series and starting make sense of all this.
> 
> Thomas please correct me if I'm wrong here...
> 
> The limit set in ttm_global_init is the watermark for the TTM pool
> where
> if exceeded upon freeing a BO's pages the pages are actually freed
> rather than just returning to the TTM pool cache. The global
> watermark
> is reason why in issue #2438 it observed a bunch of memory is still
> consumed when nothing is running or any BOs exist - pages are being
> cached in the TTM pool. 

This is correct.

> The global watermark doesn't actually limit the
> amount system memory TTM can allocate. A shrinker also exists which
> can
> free cached pages in the TTM pool if memory pressure exists or 'echo
> 3 >
> /proc/sys/vm/drop_caches' is done.

Yes, this is also true except the global watermark should be called the
pool watermark.

There is another global watermark that, if the amount of pages used for
graphics (PL_SYSTEM or PL_TT) exceeds 50% of system, bo swapping
starts. That means bos are pulled of the various LRU lists in a device
round-robin fashion and moved to shmem objects, in the anticipation
that theses shmem objects can then be paged out to disc by the core.
However, this is done even if there is no disc swap-space attached.
Also if this shmem swapping fails, nothing happens but the allocation
is free to proceed anyway.

This is what I typically refer to as the global watermark. It used to
be implemented by an opportunistic swapper process (similar to kswapd)
and a direct swapper (similar to direct reclaim) and the 50% limit was
configurable, but much of that functionality was ripped out. I didn't
follow the discussion preceding that change, though.

> 
> The watermark changed in this patch, is the actual limit for the
> number
> of pages we can allocate for BOs.

What is changed in this patch is actually the amount to memory we can
have in the TT placement and therefore also bound to the device. If
this limit is exceeded, eviction from TT to SYSTEM will take place in
addition to the above global swapping. This limit is per device. So now
we can in theory have all of system bound to a device.

>  With a shrinker hooked into BOs, we
> now can freely allocate all of the system pages for BOs and if memory
> pressure exists idle BOs pages are swapped to shmem via the shrinker
> and
> restored upon next GPU use.

Exactly.

> 
> Matt

/Thomas

> 
> > > 
> > > Signed-off-by: Thomas Hellström
> > > <thomas.hellstrom@xxxxxxxxxxxxxxx>
> > > ---
> > >  drivers/gpu/drm/xe/xe_ttm_sys_mgr.c | 3 +--
> > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> > > b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> > > index 9844a8edbfe1..d38b91872da3 100644
> > > --- a/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> > > +++ b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> > > @@ -108,9 +108,8 @@ int xe_ttm_sys_mgr_init(struct xe_device *xe)
> > >  	u64 gtt_size;
> > >  
> > >  	si_meminfo(&si);
> > > +	/* Potentially restrict amount of TT memory here. */
> > >  	gtt_size = (u64)si.totalram * si.mem_unit;
> > > -	/* TTM limits allocation of all TTM devices by 50% of
> > > system memory */
> > > -	gtt_size /= 2;
> > >  
> > >  	man->use_tt = true;
> > >  	man->func = &xe_ttm_sys_mgr_func;
> >