On Mon, Mar 22, 2021 at 02:05:48PM +0000, Matthew Wilcox wrote: > On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote: > > On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote: > > > Am 20.03.21 um 14:17 schrieb Daniel Vetter: > > > > On Sat, Mar 20, 2021 at 10:04 AM Christian König > > > > <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > > > > > Am 19.03.21 um 20:06 schrieb Daniel Vetter: > > > > > > On Fri, Mar 19, 2021 at 07:53:48PM +0100, Christian König wrote: > > > > > > > Am 19.03.21 um 18:52 schrieb Daniel Vetter: > > > > > > > > On Fri, Mar 19, 2021 at 03:08:57PM +0100, Christian König wrote: > > > > > > > > > Don't print a warning when we fail to allocate a page for swapping things out. > > > > > > > > > > > > > > > > > > Also rely on memalloc_nofs_save/memalloc_nofs_restore instead of GFP_NOFS. > > > > > > > > Uh this part doesn't make sense. Especially since you only do it for the > > > > > > > > debugfs file, not in general. Which means you've just completely broken > > > > > > > > the shrinker. > > > > > > > Are you sure? My impression is that GFP_NOFS should now work much more out > > > > > > > of the box with the memalloc_nofs_save()/memalloc_nofs_restore(). > > > > > > Yeah, if you'd put it in the right place :-) > > > > > > > > > > > > But also -mm folks are very clear that memalloc_no*() family is for dire > > > > > > situation where there's really no other way out. For anything where you > > > > > > know what you're doing, you really should use explicit gfp flags. > > > > > My impression is just the other way around. You should try to avoid the > > > > > NOFS/NOIO flags and use the memalloc_no* approach instead. > > > > Where did you get that idea? > > > > > > Well from the kernel comment on GFP_NOFS: > > > > > > * %GFP_NOFS will use direct reclaim but will not use any filesystem > > > interfaces. > > > * Please try to avoid using this flag directly and instead use > > > * memalloc_nofs_{save,restore} to mark the whole scope which > > > cannot/shouldn't > > > * recurse into the FS layer with a short explanation why. All allocation > > > * requests will inherit GFP_NOFS implicitly. > > > > Huh that's interesting, since iirc Willy or Dave told me the opposite, and > > the memalloc_no* stuff is for e.g. nfs calling into network layer (needs > > GFP_NOFS) or swap on top of a filesystems (even needs GFP_NOIO I think). > > > > Adding them, maybe I got confused. > > My impression is that the scoped API is preferred these days. > > https://www.kernel.org/doc/html/latest/core-api/gfp_mask-from-fs-io.html > > I'd probably need to spend a few months learning the DRM subsystem to > have a more detailed opinion on whether passing GFP flags around explicitly > or using the scope API is the better approach for your situation. Atm it's a single allocation in the ttm shrinker that's already explicitly using GFP_NOFS that we're talking about here. The scoped api might make sense for gpu scheduler, where we really operate under GFP_NOWAIT for somewhat awkward reasons. But also I thought at least for GFP_NOIO you generally need a mempool and think about how you guarantee forward progress anyway. Is that also a bit outdated thinking, and nowadays we could operate under the assumption that this Just Works? Given that GFP_NOFS seems to fall over already for us I'm not super sure about that ... > I usually defer to Michal on these kinds of questions. > > > > > The kernel is full of explicit gfp_t flag > > > > passing to make this as explicit as possible. The memalloc_no* stuff > > > > is just for when you go through entire subsystems and really can't > > > > wire it through. I can't find the discussion anymore, but that was the > > > > advice I got from mm/fs people. > > > > > > > > One reason is that generally a small GFP_KERNEL allocation never > > > > fails. But it absolutely can fail if it's in a memalloc_no* section, > > > > and these kind of non-obvious non-local effects are a real pain in > > > > testing and review. Hence explicit gfp_flag passing as much as > > > > possible. > > I agree with this; it's definitely a problem with the scope API. I wanted > to extend it to include GFP_NOWAIT, but if you do that, your chances of > memory allocation failure go way up, so you really want to set __GFP_NOWARN > too, but now you need to audit all the places that you're calling to be > sure they really handle errors correctly. > > So I think I'm giving up on that patch set. Yeah the auditing is what scares me, and why at least personally I prefer explicit gfp flags. It's much easier to debug a lockdep splat involving fs_reclaim than memory allocation failures leading to very strange bugs because we're not handling the allocation failure properly (or maybe not even at all). -Daniel > > > > > > > > > If this is just to paper over the seq_printf doing the wrong allocations, > > > > > > > > then just move that out from under the fs_reclaim_acquire/release part. > > > > > > > No, that wasn't the problem. > > > > > > > > > > > > > > We have just seen to many failures to allocate pages for swapout and I think > > > > > > > that would improve this because in a lot of cases we can then immediately > > > > > > > swap things out instead of having to rely on upper layers. > > > > > > Yeah, you broke it. Now the real shrinker is running with GFP_KERNEL, > > > > > > because your memalloc_no is only around the debugfs function. And ofc it's > > > > > > much easier to allocate with GFP_KERNEL, right until you deadlock :-) > > > > > The problem here is that for example kswapd calls the shrinker without > > > > > holding a FS lock as far as I can see. > > > > > > > > > > And it is rather sad that we can't optimize this case directly. > > > > I'm still not clear what you want to optimize? You can check for "is > > > > this kswapd" in pf flags, but that sounds very hairy and fragile. > > > > > > Well we only need the NOFS flag when the shrinker callback really comes from > > > a memory shortage in the FS subsystem, and that is rather unlikely. > > > > > > When we would allow all other cases to be able to directly IO the freed up > > > pages to swap it would certainly help. > > > > tbh I'm not sure. i915-gem code has played tricks with special casing the > > kswapd path, and they do kinda scare me at least. I'm not sure whether > > there's not some hidden dependencies there that would make this a bad > > idea. Like afaik direct reclaim can sometimes stall for kswapd to catch up > > a bit, or at least did in the past (I think, really not much clue about > > this) > > > > The other thing is that the fs_reclaim_acquire/release annotation really > > only works well if you use it outside of the direct reclaim path too. > > Otherwise it's not much better than just lots of testing. That pretty much > > means you have to annotate the kswapd path. > > -Daniel > > > > > > > > > > > > Christian. > > > > > > > -Daniel > > > > > > > > > Anyway you are right if some caller doesn't use the memalloc_no*() > > > > > approach we are busted. > > > > > > > > > > Going to change the patch to only not warn for the moment. > > > > > > > > > > Regards, > > > > > Christian. > > > > > > > > > > > Shrinking is hard, there's no easy way out here. > > > > > > > > > > > > Cheers, Daniel > > > > > > > > > > > > > Regards, > > > > > > > Christian. > > > > > > > > > > > > > > > > > > > > > > __GFP_NOWARN should be there indeed I think. > > > > > > > > -Daniel > > > > > > > > > > > > > > > > > Signed-off-by: Christian König <christian.koenig@xxxxxxx> > > > > > > > > > --- > > > > > > > > > drivers/gpu/drm/ttm/ttm_tt.c | 5 ++++- > > > > > > > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c > > > > > > > > > index 2f0833c98d2c..86fa3e82dacc 100644 > > > > > > > > > --- a/drivers/gpu/drm/ttm/ttm_tt.c > > > > > > > > > +++ b/drivers/gpu/drm/ttm/ttm_tt.c > > > > > > > > > @@ -369,7 +369,7 @@ static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink, > > > > > > > > > }; > > > > > > > > > int ret; > > > > > > > > > - ret = ttm_bo_swapout(&ctx, GFP_NOFS); > > > > > > > > > + ret = ttm_bo_swapout(&ctx, GFP_KERNEL | __GFP_NOWARN); > > > > > > > > > return ret < 0 ? SHRINK_EMPTY : ret; > > > > > > > > > } > > > > > > > > > @@ -389,10 +389,13 @@ static unsigned long ttm_tt_shrinker_count(struct shrinker *shrink, > > > > > > > > > static int ttm_tt_debugfs_shrink_show(struct seq_file *m, void *data) > > > > > > > > > { > > > > > > > > > struct shrink_control sc = { .gfp_mask = GFP_KERNEL }; > > > > > > > > > + unsigned int flags; > > > > > > > > > fs_reclaim_acquire(GFP_KERNEL); > > > > > > > > > + flags = memalloc_nofs_save(); > > > > > > > > > seq_printf(m, "%lu/%lu\n", ttm_tt_shrinker_count(&mm_shrinker, &sc), > > > > > > > > > ttm_tt_shrinker_scan(&mm_shrinker, &sc)); > > > > > > > > > + memalloc_nofs_restore(flags); > > > > > > > > > fs_reclaim_release(GFP_KERNEL); > > > > > > > > > return 0; > > > > > > > > > -- > > > > > > > > > 2.25.1 > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > dri-devel mailing list > > > > > > > > > dri-devel@xxxxxxxxxxxxxxxxxxxxx > > > > > > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > > > > > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch