On Sat 17-08-24 10:29:31, Yafang Shao wrote: > On Fri, Aug 16, 2024 at 4:17 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > Andrew, could you merge the following before PF_MEMALLOC_NORECLAIM can > > be removed from the tree altogether please? For the full context the > > email thread starts here: https://lore.kernel.org/all/20240812090525.80299-1-laoar.shao@xxxxxxxxx/T/#u > > --- > > From f17d36975ec343d9388aa6dbf9ca8d1b58ed09ce Mon Sep 17 00:00:00 2001 > > From: Michal Hocko <mhocko@xxxxxxxx> > > Date: Fri, 16 Aug 2024 10:10:00 +0200 > > Subject: [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM > > > > PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1] > > that such a allocation contex is inherently unsafe if the context > > doesn't fully control all allocations called from this context. Any > > potential __GFP_NOFAIL request from withing PF_MEMALLOC_NORECLAIM > > context would BUG_ON if the allocation would fail. > > > > [1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/ > > > > Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> > > Documenting the risk is a good first step. For this change: > > Acked-by: Yafang Shao <laoar.shao@xxxxxxxxx> > > Even without the PF_MEMALLOC_NORECLAIM flag, the underlying risk > remains, as users can still potentially set both ~__GPF_DIRECT_RECLAIM > and __GFP_NOFAIL. Users can configure all sorts of nonsensical gfp flags combination. That is a sad reality of the interface. But we do assume that kernel code is somehow sane. Besides that Barry is working on making this less likely by droppong __GFP_NOFAIL and replace it by GFP_NOFAIL which always includes __GFP_DIRECT_RECLAIM. Sure nothing will prevent callers from clearing that flag explicitly but we have no real defense afains broken code. > PF_MEMALLOC_NORECLAIM does not create this risk; it > only exacerbates it. The core problem lies in the complexity of the > various GFP flags and the lack of robust management for them. While we > have extensive documentation on these flags, it can still be > confusing, particularly for new developers who haven't yet encountered > real-world issues. > > For instance: > > * %GFP_NOWAIT is for kernel allocations that should not stall for direct > * reclaim, > #define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM | __GFP_NOWARN) > > Initially, it wasn't clear to me why setting __GFP_KSWAPD_RECLAIM and > __GFP_NOWARN would prevent direct reclaim. It only became apparent > after I studied the entire code path of page allocation. I believe > other newcomers to kernel development may face similar confusion as I > did early in my experience. > > The real issue we need to address is improving the management of these > GFP flags, though I don't have a concrete solution at this time. Welcome to the club. Changing this interface is a _huge_ undertaking. Just have a look how many users of the gfp flags we have in the kernel. I can tell you from a first hand experience that even minor tweaks are really hard to make. -- Michal Hocko SUSE Labs