On Sat, Jun 10, 2023 at 1:40 PM Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote: > > syzbot is reporting lockdep warning in __stack_depot_save(), for > __kasan_record_aux_stack() is passing GFP_NOWAIT which will result in > calling wakeup_kcompactd() from wakeup_kswapd() from wake_all_kswapds() > from __alloc_pages_slowpath(). > > Strictly speaking, __kasan_record_aux_stack() is responsible for removing > __GFP_KSWAPD_RECLAIM flag in order not to wake kswapd which in turn wakes > kcompactd. But since KASAN and KMSAN functions might be called with > arbitrary locks held, we should consider removing __GFP_KSWAPD_RECLAIM > flag from KASAN and KMSAN. And this patch goes one step further; let's > remove __GFP_KSWAPD_RECLAIM flag in the __stack_depot_save() side, based > on the following reasons. > > Reason 1: > > Currently, __stack_depot_save() has "alloc_flags &= ~GFP_ZONEMASK;" line > which is pointless because "alloc_flags &= (GFP_ATOMIC | GFP_KERNEL);" > line will also zero out zone modifiers. Good catch, we indeed do not need the GFP_ZONEMASK line now. But looks like you'll need it at least in the __GFP_NOFAIL branch? > But why is __stack_depot_save() > trying to mask gfp flags supplied by the caller? > > I guess that __stack_depot_save() tried to be as robust as possible. But > __stack_depot_save() is a debugging function where all callers have to > be able to survive allocation failures. This, but also the allocation should not deadlock. E.g. KMSAN can call __stack_depot_save() from almost any function in the kernel, so we'd better avoid heavyweight memory reclaiming, because that in turn may call __stack_depot_save() again. > > Reason 2: > > __stack_depot_save() from stack_depot_save() is also called by > ref_tracker_alloc() from __netns_tracker_alloc() from > netns_tracker_alloc() from get_net_track(), and some of get_net_track() > users are passing GFP_ATOMIC because waking kswapd/kcompactd is safe. > But even if we mask __GFP_KSWAPD_RECLAIM flag at __stack_depot_save(), > it is very likely that allocations with __GFP_KSWAPD_RECLAIM flag happen > somewhere else by the moment __stack_depot_save() is called for the next > time. > > Therefore, not waking kswapd/kcompactd when doing allocation for > __stack_depot_save() will be acceptable from the memory reclaim latency > perspective. Ack. > While we are at it, let's make __stack_depot_save() accept __GFP_NORETRY > and __GFP_RETRY_MAYFAIL flags, based on the following reason. Looks like you're accepting a whole bunch of flags in addition to __GFP_NORETRY and __GFP_RETRY_MAYFAIL - maybe list the two explicitly? > Reason 3: > > Since DEPOT_POOL_ORDER is defined as 2, we must mask __GFP_NOFAIL flag > in order not to complain rmqueue(). But masking __GFP_NORETRY flag and > __GFP_RETRY_MAYFAIL flag might be overkill. > > The OOM killer might be needlessly invoked due to order-2 allocation if > GFP_KERNEL is supplied by the caller, despite the caller might have > passed GFP_KERNEL for doing order-0 allocation. As you noted above, stackdepot is a debug feature anyway, so invoking OOM killer because there is no memory for an order-2 allocation might be an acceptable behavior? > Allocation for order-2 might stall if GFP_NOFS or GFP_NOIO is supplied > by the caller, despite the caller might have passed GFP_NOFS or GFP_NOIO > for doing order-0 allocation. What if the caller passed GFP_NOFS to avoid calling back into FS, and discarding that flag would result in a recursion? Same for GFP_NOIO. > Generally speaking, I feel that doing order-2 allocation from > __stack_depot_save() with gfp flags supplied by the caller is an > unexpected behavior for the callers. We might want to use only order-0 > allocation, and/or stop using gfp flags supplied by the caller... Right now stackdepot allows the following list of flags: __GFP_HIGH, __GFP_KSWAPD_RECLAIM, __GFP_DIRECT_RECLAIM, __GFP_IO, __GFP_FS. We could restrict it further to __GFP_HIGH | __GFP_DIRECT_RECLAIM to be on the safe side - plus allow __GFP_NORETRY and __GFP_RETRY_MAYFAIL. > Reported-by: syzbot <syzbot+ece2915262061d6e0ac1@xxxxxxxxxxxxxxxxxxxxxxxxx> > Closes: https://syzkaller.appspot.com/bug?extid=ece2915262061d6e0ac1 > Suggested-by: Alexander Potapenko <glider@xxxxxxxxxx> > Cc: Huang, Ying <ying.huang@xxxxxxxxx> > Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > --- > Changes in v3: > Huang, Ying thinks that masking __GFP_KSWAPD_RECLAIM flag in the callers > side is preferable > ( https://lkml.kernel.org/r/87fs7nyhs3.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ). > But Alexander Potapenko thinks that masking __GFP_KSWAPD_RECLAIM flag > in the callee side would be the better > ( https://lkml.kernel.org/r/CAG_fn=UTTbkGeOX0teGcNOeobtgV=mfGOefZpV-NTN4Ouus7xA@xxxxxxxxxxxxxx ). > I took Alexander's suggestion, and added reasoning for masking > __GFP_KSWAPD_RECLAIM flag in the callee side. > > Changes in v2: > Mask __GFP_KSWAPD_RECLAIM flag in the callers, suggested by Huang, Ying > ( https://lkml.kernel.org/r/87edn92jvz.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ). > > lib/stackdepot.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/lib/stackdepot.c b/lib/stackdepot.c > index 2f5aa851834e..33ebefaa7074 100644 > --- a/lib/stackdepot.c > +++ b/lib/stackdepot.c > @@ -405,7 +405,10 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries, > * contexts and I/O. > */ > alloc_flags &= ~GFP_ZONEMASK; > - alloc_flags &= (GFP_ATOMIC | GFP_KERNEL); > + if (!(alloc_flags & __GFP_DIRECT_RECLAIM)) > + alloc_flags &= __GFP_HIGH; > + else > + alloc_flags &= ~__GFP_NOFAIL; > alloc_flags |= __GFP_NOWARN; > page = alloc_pages(alloc_flags, DEPOT_POOL_ORDER); > if (page) > -- > 2.18.4 > >