Re: [PATCH 1/2] drm/ttm: rework ttm_tt page limit v2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 16.12.20 um 16:09 schrieb Daniel Vetter:
On Wed, Dec 16, 2020 at 03:04:26PM +0100, Christian König wrote:
[SNIP]
@@ -276,9 +277,9 @@ static void ttm_shrink(struct ttm_mem_global *glob, bool from_wq,
while (ttm_zones_above_swap_target(glob, from_wq, extra)) {
  		spin_unlock(&glob->lock);
-		ret = ttm_bo_swapout(ctx);
+		ret = ttm_bo_swapout(ctx, 0);
General we don't treat gfp_mask as a set of additional flags, but the full
thing. So here we should have GFP_KERNEL.

Also having both the shrinker and the ttm_shrink_work is a bit much, the
shrink work should get deleted completely I think.

That's why I'm moving the shrinker work into VMWGFX with patch #2 :)

  		spin_lock(&glob->lock);
-		if (unlikely(ret != 0))
+		if (unlikely(ret < 0))
  			break;
  	}
@@ -453,6 +454,7 @@ int ttm_mem_global_init(struct ttm_mem_global *glob)
  			zone->name, (unsigned long long)zone->max_mem >> 10);
  	}
  	ttm_pool_mgr_init(glob->zone_kernel->max_mem/(2*PAGE_SIZE));
+	ttm_tt_mgr_init();
  	return 0;
  out_no_zone:
  	ttm_mem_global_release(glob);
@@ -466,6 +468,7 @@ void ttm_mem_global_release(struct ttm_mem_global *glob)
/* let the page allocator first stop the shrink work. */
  	ttm_pool_mgr_fini();
+	ttm_tt_mgr_fini();
flush_workqueue(glob->swap_queue);
  	destroy_workqueue(glob->swap_queue);
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 7f75a13163f0..d454c428c56a 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -38,6 +38,8 @@
  #include <drm/drm_cache.h>
  #include <drm/ttm/ttm_bo_driver.h>
+static struct shrinker mm_shrinker;
+
  /*
   * Allocates a ttm structure for the given BO.
   */
@@ -223,13 +225,23 @@ int ttm_tt_swapin(struct ttm_tt *ttm)
  	return ret;
  }
-int ttm_tt_swapout(struct ttm_bo_device *bdev, struct ttm_tt *ttm)
+/**
+ * ttm_tt_swapout - swap out tt object
+ *
+ * @bdev: TTM device structure.
+ * @ttm: The struct ttm_tt.
+ * @gfp_flags: Flags to use for memory allocation.
+ *
+ * Swapout a TT object to a shmem_file, return number of pages swapped out or
+ * negative error code.
+ */
+int ttm_tt_swapout(struct ttm_bo_device *bdev, struct ttm_tt *ttm,
+		   gfp_t gfp_flags)
  {
  	struct address_space *swap_space;
  	struct file *swap_storage;
  	struct page *from_page;
  	struct page *to_page;
-	gfp_t gfp_mask;
  	int i, ret;
swap_storage = shmem_file_setup("ttm swap",
@@ -241,14 +253,14 @@ int ttm_tt_swapout(struct ttm_bo_device *bdev, struct ttm_tt *ttm)
  	}
swap_space = swap_storage->f_mapping;
-	gfp_mask = mapping_gfp_mask(swap_space);
+	gfp_flags |= mapping_gfp_mask(swap_space);
I don't think this combines flags correctly. mapping_gfp_mask is most
likely GFP_KERNEL or something like that, so __GFP_FS (the thing you want
to check for to avoid recursion) is always set.

Ah, ok. Thanks for the information, because of GFP_NOFS I was assuming that the mask works just the other way around.

I think we need an & here, and maybe screaming if the gfp flags for the
swapout space are funny.

for (i = 0; i < ttm->num_pages; ++i) {
  		from_page = ttm->pages[i];
  		if (unlikely(from_page == NULL))
  			continue;
- to_page = shmem_read_mapping_page_gfp(swap_space, i, gfp_mask);
+		to_page = shmem_read_mapping_page_gfp(swap_space, i, gfp_flags);
  		if (IS_ERR(to_page)) {
  			ret = PTR_ERR(to_page);
  			goto out_err;
@@ -263,7 +275,7 @@ int ttm_tt_swapout(struct ttm_bo_device *bdev, struct ttm_tt *ttm)
  	ttm->swap_storage = swap_storage;
  	ttm->page_flags |= TTM_PAGE_FLAG_SWAPPED;
- return 0;
+	return ttm->num_pages;
out_err:
  	fput(swap_storage);
@@ -341,3 +353,63 @@ void ttm_tt_unpopulate(struct ttm_bo_device *bdev,
  		ttm_pool_free(&bdev->pool, ttm);
  	ttm->page_flags &= ~TTM_PAGE_FLAG_PRIV_POPULATED;
  }
+
+/* As long as pages are available make sure to release at least one */
+static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink,
+					  struct shrink_control *sc)
+{
+	struct ttm_operation_ctx ctx = {
+		.no_wait_gpu = true
Iirc there's an eventual shrinker limit where it gets desperate. I think
once we hit that, we should allow gpu waits. But it's not passed to
shrinkers for reasons, so maybe we should have a second round that tries
to more actively shrink objects if we fell substantially short of what
reclaim expected us to do?

I think we should try to avoid waiting for the GPU in the shrinker callback.

When we get HMM we will have cases where the shrinker is called from there and we can't wait for the GPU then without causing deadlocks.


Also don't we have a trylock_only flag here to make sure drivers don't do
something stupid?

Mhm, I'm pretty sure drivers should only be minimal involved.

+	};
+	int ret;
+
+	if (sc->gfp_mask & GFP_NOFS)
+		return 0;
+
+	ret = ttm_bo_swapout(&ctx, GFP_NOFS);
+	return ret < 0 ? SHRINK_EMPTY : ret;
+}
+
+/* Return the number of pages available or SHRINK_EMPTY if we have none */
+static unsigned long ttm_tt_shrinker_count(struct shrinker *shrink,
+					   struct shrink_control *sc)
+{
+	struct ttm_buffer_object *bo;
+	unsigned long num_pages = 0;
+	unsigned int i;
+
+	if (sc->gfp_mask & GFP_NOFS)
+		return 0;
The count function should always count, and I'm not seeing a reason why
you couldn't do that here ... Also my understanding is that GFP_NOFS never
goes into shrinkers (the NOFS comes from shrinkers originally only being
used for filesystem objects), so this is double redundant.

My understanding is that gfp_mask is just to convey the right zones and
stuff, so that your shrinker can try to shrink objects in the right zones.
Hence I think the check in the _scan() function should also be removed.

Also the non __ prefixed flags are the combinations callers are supposed
to look at. Memory reclaim code needs to look at the __GFP flags, see e.g.
gfpflags_allow_blocking() or fs_reclaim_acquire().

Ok got it. But don't we need to somehow avoid recursion here?

[SNIP]
+int ttm_tt_mgr_init(void);
+void ttm_tt_mgr_fini(void);
+
  #if IS_ENABLED(CONFIG_AGP)
  #include <linux/agp_backend.h>
For testing I strongly recommend a debugfs file to trigger this shrinker
completely, from the right lockdep context (i.e. using
fs_reclaim_acquire/release()). Much easier to test that way. See
i915_drop_caches_set() in i915_debugfs.c.

That way you can fully test it all without hitting anything remotely
resembling actual OOM, which tends to kill all kinds of things.

That's exactly the reason I was switching from sysfs to debugfs in the other patch set.

Ok, in this case I'm going to reorder all that stuff and send out the debugfs patches first.

Aside from the detail work I think this is going in the right direction.
-Daniel

Thanks,
Christian.
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux