On Sun, Oct 17, 2021 at 01:17:56AM +0200, Paolo Bonzini wrote: > On 16/10/21 20:10, Linus Torvalds wrote: > > That said, I also do wonder if we could possibly change "kvcalloc()" > > to avoid the warning. The reason I didn't like your patch is that > > kvmalloc_node() only takes a "size_t", and the overflow condition > > there is that "MAX_INT". > > > > But the "kvcalloc()" case that takes a "number of elements and size" > > should _conceptually_ warn not when the total size overflows, but when > > either number or the element size overflows. > > That makes sense, but the number could still overflow in KVM's case; the > size is small, just 8, it's the count that's humongous. In general, > users of kvcalloc of kvmalloc_array *should* not be doing > multiplications (that's the whole point of the functions), and that > lowers a lot the risk of overflows, but the safest way is to provide > a variant that does not warn. See the (compile-tested only) patch > below. > > Pulling the WARN in the inline function is a bit ugly. For kvcalloc() > and kvmalloc_array(), one of the two is almost always constant, but > it is unlikely that the compiler eliminates both. The impact on a > localyesconfig build seems to be minimal though (about 150 bytes > larger out of 20 megabytes of code). > > Paolo > > ---------------- 8< ----------------- > From: Paolo Bonzini <pbonzini@xxxxxxxxxx> > Subject: [PATCH] mm: add kvmalloc variants that do not to warn > > Commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls") > restricted memory allocation with 'kvmalloc()' to sizes that fit > in an 'int', to protect against trivial integer conversion issues. > However, the WARN triggers with KVM when it allocates ancillary page > data, whose size essentially depends on whatever userspace has passed to > the KVM_SET_USER_MEMORY_REGION ioctl. The warnings are quickly found by > syzkaller, but they can also happen with huge but real-world VMs. > The largest allocation that KVM can do is 8 bytes per page of guest > memory, meaning a 1 TiB memslot will cause a warning even outside fuzzing. > In fact, Google already has VMs that create 1.5 TiB memslots (12 TiB of > total guest memory spread across 8 virtual NUMA nodes). > > For kvcalloc() and kvmalloc_array(), Linus suggested warning if either > the number or the size are big. However, this would only move the > goalpost for KVM's warning without fully avoiding it. Therefore, > provide a "double underscore" version of kvcalloc(), kvmalloc_array() > and kvmalloc_node() that omits the check. > > Cc: Willy Tarreau <w@xxxxxx> > Cc: Kees Cook <keescook@xxxxxxxxxxxx> > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> Ah, so memcg wasn't doing sanity checks? Is there a cheap way to resolve the question "does this much memory exist"? The "__" versions end up lacking context for why they're "__" versions. I.e. do we want something more descriptive, like __huge_kvmalloc_node() or __unbounded_kvmalloc_node()? -Kees > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 73a52aba448f..92aba7327bd8 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -799,7 +799,15 @@ static inline int is_vmalloc_or_module_addr(const void *x) > } > #endif > -extern void *kvmalloc_node(size_t size, gfp_t flags, int node); > +extern void *__kvmalloc_node(size_t size, gfp_t flags, int node); > +static inline void *kvmalloc_node(size_t size, gfp_t flags, int node) > +{ > + /* Don't even allow crazy sizes */ > + if (WARN_ON(size > INT_MAX)) > + return NULL; > + return __kvmalloc_node(size, flags, node); > +} > + > static inline void *kvmalloc(size_t size, gfp_t flags) > { > return kvmalloc_node(size, flags, NUMA_NO_NODE); > @@ -813,14 +821,31 @@ static inline void *kvzalloc(size_t size, gfp_t flags) > return kvmalloc(size, flags | __GFP_ZERO); > } > -static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags) > +static inline void *__kvmalloc_array(size_t n, size_t size, gfp_t flags) > { > size_t bytes; > if (unlikely(check_mul_overflow(n, size, &bytes))) > return NULL; > - return kvmalloc(bytes, flags); > + return __kvmalloc_node(bytes, flags, NUMA_NO_NODE); > +} > + > +static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags) > +{ > + /* > + * Don't allow crazy sizes here, either. For 64-bit, > + * this also lets the compiler avoid the overflow check. > + */ > + if (WARN_ON(size > INT_MAX || n > INT_MAX)) > + return NULL; > + > + return __kvmalloc_array(n, size, flags); > +} > + > +static inline void *__kvcalloc(size_t n, size_t size, gfp_t flags) > +{ > + return __kvmalloc_array(n, size, flags | __GFP_ZERO); > } > static inline void *kvcalloc(size_t n, size_t size, gfp_t flags) > diff --git a/mm/util.c b/mm/util.c > index 499b6b5767ed..0406709d8097 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -558,7 +558,7 @@ EXPORT_SYMBOL(vm_mmap); > * > * Return: pointer to the allocated memory of %NULL in case of failure > */ > -void *kvmalloc_node(size_t size, gfp_t flags, int node) > +void *__kvmalloc_node(size_t size, gfp_t flags, int node) > { > gfp_t kmalloc_flags = flags; > void *ret; > @@ -593,14 +593,10 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node) > if (ret || size <= PAGE_SIZE) > return ret; > - /* Don't even allow crazy sizes */ > - if (WARN_ON_ONCE(size > INT_MAX)) > - return NULL; > - > return __vmalloc_node(size, 1, flags, node, > __builtin_return_address(0)); > } > -EXPORT_SYMBOL(kvmalloc_node); > +EXPORT_SYMBOL(__kvmalloc_node); > /** > * kvfree() - Free memory. > > > So I would also accept a patch that just changes how "kvcalloc()" > > works (or how "kvmalloc_array()" works). > > > > It's a bit annoying how we've ended up losing that "n/size" > > information by the time we hit kvmalloc(). > > > > Linus > > > -- Kees Cook