+cc Linus, vmalloc reviewers On Sun, Oct 20, 2024 at 09:00:07AM -0400, Kent Overstreet wrote: > On Sun, Oct 20, 2024 at 12:45:33PM +0100, Lorenzo Stoakes wrote: > > On Sat, Oct 19, 2024 at 05:00:37PM -0400, Kent Overstreet wrote: > > > A user with a 75 TB filesystem reported the following journal replay > > > error: > > > https://github.com/koverstreet/bcachefs/issues/769 > > > > > > In journal replay we have to sort and dedup all the keys from the > > > journal, which means we need a large contiguous allocation. Given that > > > the user has 128GB of ram, the 2GB limit on allocation size has become > > > far too small. > > > > > > Cc: Vlastimil Babka <vbabka@xxxxxxx> > > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > > Signed-off-by: Kent Overstreet <kent.overstreet@xxxxxxxxx> > > > --- > > > mm/util.c | 6 ------ > > > 1 file changed, 6 deletions(-) > > > > > > diff --git a/mm/util.c b/mm/util.c > > > index 4f1275023eb7..c60df7723096 100644 > > > --- a/mm/util.c > > > +++ b/mm/util.c > > > @@ -665,12 +665,6 @@ void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node) > > > if (!gfpflags_allow_blocking(flags)) > > > return NULL; > > > > > > - /* Don't even allow crazy sizes */ > > > - if (unlikely(size > INT_MAX)) { > > > - WARN_ON_ONCE(!(flags & __GFP_NOWARN)); > > > - return NULL; > > > - } > > > - > > > > Err, and not replace it with _any_ limit? That seems very unwise. > > large allocations will go to either the page allocator or vmalloc, and > they have their own limits. Ah actually I misread it here, I see the allocation gets immediately sent off to __kmalloc_node_noprof() and thus that'll apply its own limits before doing this check prior to the vmalloc call. We actually do have a basic check in __vmalloc_node_range_noprof() that prevents _totally_ insane requests, checking that size >> PAGE_SHIFT <= totalram_pages(), so we shouldn't get anything too stupid here (I am thinking especially of ptr + size overflow type situations). But Linus explicitly introduced this INT_MAX check in commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls"), presumably for a reason, so have cc'd him here in case he has an objection to this which amounts to a revert of that patch. Assuming Linus doesn't object, I don't see how this is really doing anything different than just invoking __vmalloc_node_range_noprof() direct which we do in quite a few places anyway? I guess let's wait and see what he says or if Vlastimil/the vmalloc reviewers have any thoughts on this. But looks sane to me otherwise. > > although I should have a look at that, and make sure we're not > triggering the > MAX_ORDER warning in the page allocator unnecessarily w > hen we could just call vmalloc(). OK I guess tha would be a check prior to invoking __kmalloc_node_noprof() -> ... -> __alloc_pages_noprof() and WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp) ? Sounds sensible.