Re: [PATCH] drm/ttm: stop warning on TT shrinker failure

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Mon, 22 Mar 2021 20:34:25 +0100

Am 22.03.21 um 18:02 schrieb Daniel Vetter:
On Mon, Mar 22, 2021 at 5:06 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
On Mon 22-03-21 14:05:48, Matthew Wilcox wrote:
On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote:
On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote:
Am 20.03.21 um 14:17 schrieb Daniel Vetter:
On Sat, Mar 20, 2021 at 10:04 AM Christian König
<ckoenig.leichtzumerken@xxxxxxxxx> wrote:
Am 19.03.21 um 20:06 schrieb Daniel Vetter:
On Fri, Mar 19, 2021 at 07:53:48PM +0100, Christian König wrote:
Am 19.03.21 um 18:52 schrieb Daniel Vetter:
On Fri, Mar 19, 2021 at 03:08:57PM +0100, Christian König wrote:
Don't print a warning when we fail to allocate a page for swapping things out.

Also rely on memalloc_nofs_save/memalloc_nofs_restore instead of GFP_NOFS.
Uh this part doesn't make sense. Especially since you only do it for the
debugfs file, not in general. Which means you've just completely broken
the shrinker.
Are you sure? My impression is that GFP_NOFS should now work much more out
of the box with the memalloc_nofs_save()/memalloc_nofs_restore().
Yeah, if you'd put it in the right place :-)

But also -mm folks are very clear that memalloc_no*() family is for dire
situation where there's really no other way out. For anything where you
know what you're doing, you really should use explicit gfp flags.
My impression is just the other way around. You should try to avoid the
NOFS/NOIO flags and use the memalloc_no* approach instead.
Where did you get that idea?
Well from the kernel comment on GFP_NOFS:

  * %GFP_NOFS will use direct reclaim but will not use any filesystem
interfaces.
  * Please try to avoid using this flag directly and instead use
  * memalloc_nofs_{save,restore} to mark the whole scope which
cannot/shouldn't
  * recurse into the FS layer with a short explanation why. All allocation
  * requests will inherit GFP_NOFS implicitly.
Huh that's interesting, since iirc Willy or Dave told me the opposite, and
the memalloc_no* stuff is for e.g. nfs calling into network layer (needs
GFP_NOFS) or swap on top of a filesystems (even needs GFP_NOIO I think).

Adding them, maybe I got confused.
My impression is that the scoped API is preferred these days.

https://www.kernel.org/doc/html/latest/core-api/gfp_mask-from-fs-io.html

I'd probably need to spend a few months learning the DRM subsystem to
have a more detailed opinion on whether passing GFP flags around explicitly
or using the scope API is the better approach for your situation.
yes, in an ideal world we would have a clearly defined scope of the
reclaim recursion wrt FS/IO associated with it. I've got back to
https://lore.kernel.org/amd-gfx/20210319140857.2262-1-christian.koenig@xxxxxxx/
and there are two things standing out. Why does ttm_tt_debugfs_shrink_show
really require NOFS semantic? And why does it play with
fs_reclaim_acquire?
It's our shrinker. shrink_show simply triggers that specific shrinker
asking it to shrink everything it can, which helps a lot with testing
without having to drive the entire system against the OOM wall.
fs_reclaim_acquire is there to make sure lockdep understands that this
is a shrinker and that it checks all the dependencies for us like if
we'd be in real reclaim. There is some drop caches interfaces in proc
iirc, but those drop everything, and they don't have the fs_reclaim
annotations to teach lockdep about what we're doing.

To summarize the debugfs code is basically to test if that stuff really 
works with GFP_NOFS.

My only concern is that if I could rely on memalloc_no* being used we 
could optimize this quite a bit further.

Regards,
Christian.

-Daniel

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx