Re: [PATCH 3/3] drm/ttm: make up to 90% of system memory available

Daniel Vetter <daniel@xxxxxxxx> · Tue, 17 Nov 2020 18:19:00 +0100

On Tue, Nov 17, 2020 at 03:06:15PM +0100, Christian König wrote:
> Increase the ammount of system memory drivers can use to about 90% of
> the total available.
> 
> Signed-off-by: Christian König <christian.koenig@xxxxxxx>
> ---
>  drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index a958135cb3fe..0a93df93dba4 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1267,7 +1267,7 @@ static int ttm_bo_global_init(void)
>  	 * the available system memory.
>  	 */
>  	num_pages = (u64)si.totalram * si.mem_unit;
> -	num_pages = (num_pages * 50 / 100) >> PAGE_SHIFT;
> +	num_pages = (num_pages * 90 / 100) >> PAGE_SHIFT;

I don't think this is the design we want. As long as it was set at "half
of system memory" it was clear that a) it's a hack b) precision didn't
matter.

But if you go to the limit and still want to keep the "we make sure
there's no OOM", then precision starts to matter:
- memory hotplug and hotunplug is a thing
- userspace can mlock, and it's configureable
- drivers can pin_user_pages for IO and random other stuff. Some of it is
  accounted as some subsystem specific mlock (like rdma does iirc), some
  is just yolo or short term enough (like)
- none of what we do here takes into considerations any interactions with
  core mm tracking (like cgroups or numa or anything like that)

If we want to drop the "half of system ram" limit (and yes that makes
sense) I think the right design is:

- we give up on the "no OOM" guarantee.

- This means if you want real isolation of tasks, we need cgroups, and we
  need to integrate ttm cgroups with system memory cgroups somehow. Unlike
  randomly picked hardcoded limits this should work a lot more reliably
  and be a lot more useful in practical use in the field.

- This also means that drivers start to fail in interesting ways. I think
  locking headaches are covered with all the lockdep annotations I've
  pushed, plus some of the things I still have in-flight (I have a
  might_alloc() annotations somewhere). That leaves validation of error
  paths for when allocations fail. Ime a very effective way we used in
  i915 is (ab)using EINTR restarting, which per drmIoctl uapi spec is
  requried. We could put a debug mode into ttm_tt which randomly fails
  with -EINTR to make sure it's all working correctly (plus anything else
  that allocates memory), and unlike real out-of-memory injection piglit
  and any other cts will complete without failure. Which gives us an
  excellent metric for "does it work". Actualy OOM, even injected one,
  tends to make stuff blow up in a way that's very hard to track and make
  sure you've got good coverage, since all your usual tests die pretty
  quickly.

- ttm_tt needs to play fair with every other system memory user. We need
  to register a shrinker for each ttm_tt (so usually one per device I
  guess), which walks the lru (in shrink_count) and uses dma_resv_trylock
  for actual shrinking. We probably want to move it to SYSTEM first for
  that shrinker to pick up, so that there's some global fairness across
  all ttm_tt.

- for GFP_DMA32 that means zone aware shrinkers. We've never used those,
  because thus far i915 didn't have any big need for low memory, so we
  haven't used this in practice. But it's supposed to be a thing.

It's a bit more code than the oneliner above, but I also think it's a lot
more solid. Plus it would resolve the last big issue where i915 gem is
fairly fundamentally different compared to ttm. For that question I think
once Maarten's locking rework for i915 has landed and all the other ttm
rework from you and Dave is in, we've resolved them all.

>  	/* But for DMA32 we limit ourself to only use 2GiB maximum. */
>  	num_dma32_pages = (u64)(si.totalram - si.totalhigh) * si.mem_unit;
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel