Re: [PATCH] mm: allow huge kvmalloc() calls if they're accounted to memcg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Oct 17, 2021 at 01:17:56AM +0200, Paolo Bonzini wrote:
> On 16/10/21 20:10, Linus Torvalds wrote:
> > That said, I also do wonder if we could possibly change "kvcalloc()"
> > to avoid the warning. The reason I didn't like your patch is that
> > kvmalloc_node() only takes a "size_t", and the overflow condition
> > there is that "MAX_INT".
> > 
> > But the "kvcalloc()" case that takes a "number of elements and size"
> > should _conceptually_ warn not when the total size overflows, but when
> > either number or the element size overflows.
> 
> That makes sense, but the number could still overflow in KVM's case; the
> size is small, just 8, it's the count that's humongous.  In general,
> users of kvcalloc of kvmalloc_array *should* not be doing
> multiplications (that's the whole point of the functions), and that
> lowers a lot the risk of overflows, but the safest way is to provide
> a variant that does not warn.  See the (compile-tested only) patch
> below.
> 
> Pulling the WARN in the inline function is a bit ugly.  For kvcalloc()
> and kvmalloc_array(), one of the two is almost always constant, but
> it is unlikely that the compiler eliminates both.  The impact on a
> localyesconfig build seems to be minimal though (about 150 bytes
> larger out of 20 megabytes of code).
> 
> Paolo
> 
> ---------------- 8< -----------------
> From: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Subject: [PATCH] mm: add kvmalloc variants that do not to warn
> 
> Commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> restricted memory allocation with 'kvmalloc()' to sizes that fit
> in an 'int', to protect against trivial integer conversion issues.
> However, the WARN triggers with KVM when it allocates ancillary page
> data, whose size essentially depends on whatever userspace has passed to
> the KVM_SET_USER_MEMORY_REGION ioctl.  The warnings are quickly found by
> syzkaller, but they can also happen with huge but real-world VMs.
> The largest allocation that KVM can do is 8 bytes per page of guest
> memory, meaning a 1 TiB memslot will cause a warning even outside fuzzing.
> In fact, Google already has VMs that create 1.5 TiB memslots (12 TiB of
> total guest memory spread across 8 virtual NUMA nodes).
> 
> For kvcalloc() and kvmalloc_array(), Linus suggested warning if either
> the number or the size are big.  However, this would only move the
> goalpost for KVM's warning without fully avoiding it.  Therefore,
> provide a "double underscore" version of kvcalloc(), kvmalloc_array()
> and kvmalloc_node() that omits the check.
> 
> Cc: Willy Tarreau <w@xxxxxx>
> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>

Ah, so memcg wasn't doing sanity checks?

Is there a cheap way to resolve the question "does this much memory
exist"? The "__" versions end up lacking context for why they're "__"
versions. I.e. do we want something more descriptive, like
__huge_kvmalloc_node() or __unbounded_kvmalloc_node()?

-Kees

> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 73a52aba448f..92aba7327bd8 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -799,7 +799,15 @@ static inline int is_vmalloc_or_module_addr(const void *x)
>  }
>  #endif
> -extern void *kvmalloc_node(size_t size, gfp_t flags, int node);
> +extern void *__kvmalloc_node(size_t size, gfp_t flags, int node);
> +static inline void *kvmalloc_node(size_t size, gfp_t flags, int node)
> +{
> +	/* Don't even allow crazy sizes */
> +	if (WARN_ON(size > INT_MAX))
> +		return NULL;
> +	return __kvmalloc_node(size, flags, node);
> +}
> +
>  static inline void *kvmalloc(size_t size, gfp_t flags)
>  {
>  	return kvmalloc_node(size, flags, NUMA_NO_NODE);
> @@ -813,14 +821,31 @@ static inline void *kvzalloc(size_t size, gfp_t flags)
>  	return kvmalloc(size, flags | __GFP_ZERO);
>  }
> -static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
> +static inline void *__kvmalloc_array(size_t n, size_t size, gfp_t flags)
>  {
>  	size_t bytes;
>  	if (unlikely(check_mul_overflow(n, size, &bytes)))
>  		return NULL;
> -	return kvmalloc(bytes, flags);
> +	return __kvmalloc_node(bytes, flags, NUMA_NO_NODE);
> +}
> +
> +static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
> +{
> +	/*
> +	 * Don't allow crazy sizes here, either.  For 64-bit,
> +	 * this also lets the compiler avoid the overflow check.
> +	 */
> +	if (WARN_ON(size > INT_MAX || n > INT_MAX))
> +		return NULL;
> +
> +	return __kvmalloc_array(n, size, flags);
> +}
> +
> +static inline void *__kvcalloc(size_t n, size_t size, gfp_t flags)
> +{
> +	return __kvmalloc_array(n, size, flags | __GFP_ZERO);
>  }
>  static inline void *kvcalloc(size_t n, size_t size, gfp_t flags)
> diff --git a/mm/util.c b/mm/util.c
> index 499b6b5767ed..0406709d8097 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -558,7 +558,7 @@ EXPORT_SYMBOL(vm_mmap);
>   *
>   * Return: pointer to the allocated memory of %NULL in case of failure
>   */
> -void *kvmalloc_node(size_t size, gfp_t flags, int node)
> +void *__kvmalloc_node(size_t size, gfp_t flags, int node)
>  {
>  	gfp_t kmalloc_flags = flags;
>  	void *ret;
> @@ -593,14 +593,10 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	if (ret || size <= PAGE_SIZE)
>  		return ret;
> -	/* Don't even allow crazy sizes */
> -	if (WARN_ON_ONCE(size > INT_MAX))
> -		return NULL;
> -
>  	return __vmalloc_node(size, 1, flags, node,
>  			__builtin_return_address(0));
>  }
> -EXPORT_SYMBOL(kvmalloc_node);
> +EXPORT_SYMBOL(__kvmalloc_node);
>  /**
>   * kvfree() - Free memory.
> 
> > So I would also accept a patch that just changes how "kvcalloc()"
> > works (or how "kvmalloc_array()" works).
> > 
> > It's a bit annoying how we've ended up losing that "n/size"
> > information by the time we hit kvmalloc().
> > 
> >                 Linus
> > 
> 

-- 
Kees Cook



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux