On Tue, Sep 03, 2024 at 09:06:17AM GMT, Michal Hocko wrote: > On Mon 02-09-24 18:32:33, Kent Overstreet wrote: > > On Mon, Sep 02, 2024 at 02:52:52PM GMT, Andrew Morton wrote: > > > On Mon, 2 Sep 2024 05:53:59 -0400 Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote: > > > > > > > On Mon, Sep 02, 2024 at 11:51:48AM GMT, Michal Hocko wrote: > > > > > The previous version has been posted in [1]. Based on the review feedback > > > > > I have sent v2 of patches in the same threat but it seems that the > > > > > review has mostly settled on these patches. There is still an open > > > > > discussion on whether having a NORECLAIM allocator semantic (compare to > > > > > atomic) is worthwhile or how to deal with broken GFP_NOFAIL users but > > > > > those are not really relevant to this particular patchset as it 1) > > > > > doesn't aim to implement either of the two and 2) it aims at spreading > > > > > PF_MEMALLOC_NORECLAIM use while it doesn't have a properly defined > > > > > semantic now that it is not widely used and much harder to fix. > > > > > > > > > > I have collected Reviewed-bys and reposting here. These patches are > > > > > touching bcachefs, VFS and core MM so I am not sure which tree to merge > > > > > this through but I guess going through Andrew makes the most sense. > > > > > > > > > > Changes since v1; > > > > > - compile fixes > > > > > - rather than dropping PF_MEMALLOC_NORECLAIM alone reverted eab0af905bfc > > > > > ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN") suggested > > > > > by Matthew. > > > > > > > > To reiterate: > > > > > > > > > > It would be helpful to summarize your concerns. > > > > > > What runtime impact do you expect this change will have upon bcachefs? > > > > For bcachefs: I try really hard to minimize tail latency and make > > performance robust in extreme scenarios - thrashing. A large part of > > that is that btree locks must be held for no longer than necessary. > > > > We definitely don't want to recurse into other parts of the kernel, > > taking other locks (i.e. in memory reclaim) while holding btree locks; > > that's a great way to stack up (and potentially multiply) latencies. > > OK, these two patches do not fail to do that. The only existing user is > turned into GFP_NOWAIT so the final code works the same way. Right? https://lore.kernel.org/linux-mm/20240828140638.3204253-1-kent.overstreet@xxxxxxxxx/ > > But gfp flags don't work with vmalloc allocations (and that's unlikely > > to change), and we require vmalloc fallbacks for e.g. btree node > > allocation. That's the big reason we want MEMALLOC_PF_NORECLAIM. > > Have you even tried to reach out to vmalloc maintainers and asked for > GFP_NOWAIT support for vmalloc? Because I do not remember that. Sure > kernel page tables are have hardcoded GFP_KERNEL context which slightly > complicates that but that doesn't really mean the only potential > solution is to use a per task flag to override that. Just from top of my > head we can consider pre-allocating virtual address space for > non-sleeping allocations. Maybe there are other options that only people > deeply familiar with the vmalloc internals can see. That sounds really overly complicated.