Re: [PATCH 2/2] drm/ttm: optimize the pool shrinker a bit v2

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Fri, 16 Apr 2021 09:08:51 +0200




Am 15.04.21 um 22:33 schrieb Andrew Morton:
On Thu, 15 Apr 2021 13:56:24 +0200 "Christian König" <ckoenig.leichtzumerken@xxxxxxxxx> wrote:

@@ -530,6 +525,11 @@ void ttm_pool_fini(struct ttm_pool *pool)
  			for (j = 0; j < MAX_ORDER; ++j)
  				ttm_pool_type_fini(&pool->caching[i].orders[j]);
  	}
+
+	/* We removed the pool types from the LRU, but we need to also make sure
+	 * that no shrinker is concurrently freeing pages from the pool.
+	 */
+	sync_shrinkers();
It isn't immediately clear to me how this works.  ttm_pool_fini() has
already freed all the pages hasn't it?  So why would it care if some
shrinkers are still playing with the pages?

Yes ttm_pool_fini() has freed up all pages which had been in the pool 
when the function was called.

But the problem is it is possible that a parallel running shrinker has 
taken a page from the pool and is in the process of freeing it up.

When I return here the pool structure and especially the device 
structure are freed while the parallel running shrinker is still using them.

I could go for a design where we have one shrinker per device instead, 
but that would put a bit to much pressure on the pool in my opinion.

Or is it the case that ttm_pool_fini() is assuming that there will be
some further action against these pages, which requires that shrinkers
no longer be accessing the pages and which further assumes that future
shrinker invocations will not be able to look up these pages?

IOW, a bit more explanation about the dynamics here would help!

Sorry, I'm not a native speaker of English and sometimes still have a 
hard time explaining things.

Regards,
Christian.