On 16/10/21 20:10, Linus Torvalds wrote:
That said, I also do wonder if we could possibly change "kvcalloc()"
to avoid the warning. The reason I didn't like your patch is that
kvmalloc_node() only takes a "size_t", and the overflow condition
there is that "MAX_INT".
But the "kvcalloc()" case that takes a "number of elements and size"
should _conceptually_ warn not when the total size overflows, but when
either number or the element size overflows.
That makes sense, but the number could still overflow in KVM's case; the
size is small, just 8, it's the count that's humongous. In general,
users of kvcalloc of kvmalloc_array *should* not be doing
multiplications (that's the whole point of the functions), and that
lowers a lot the risk of overflows, but the safest way is to provide
a variant that does not warn. See the (compile-tested only) patch
below.
Pulling the WARN in the inline function is a bit ugly. For kvcalloc()
and kvmalloc_array(), one of the two is almost always constant, but
it is unlikely that the compiler eliminates both. The impact on a
localyesconfig build seems to be minimal though (about 150 bytes
larger out of 20 megabytes of code).
Paolo
---------------- 8< -----------------
From: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Subject: [PATCH] mm: add kvmalloc variants that do not to warn
Commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
restricted memory allocation with 'kvmalloc()' to sizes that fit
in an 'int', to protect against trivial integer conversion issues.
However, the WARN triggers with KVM when it allocates ancillary page
data, whose size essentially depends on whatever userspace has passed to
the KVM_SET_USER_MEMORY_REGION ioctl. The warnings are quickly found by
syzkaller, but they can also happen with huge but real-world VMs.
The largest allocation that KVM can do is 8 bytes per page of guest
memory, meaning a 1 TiB memslot will cause a warning even outside fuzzing.
In fact, Google already has VMs that create 1.5 TiB memslots (12 TiB of
total guest memory spread across 8 virtual NUMA nodes).
For kvcalloc() and kvmalloc_array(), Linus suggested warning if either
the number or the size are big. However, this would only move the
goalpost for KVM's warning without fully avoiding it. Therefore,
provide a "double underscore" version of kvcalloc(), kvmalloc_array()
and kvmalloc_node() that omits the check.
Cc: Willy Tarreau <w@xxxxxx>
Cc: Kees Cook <keescook@xxxxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 73a52aba448f..92aba7327bd8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -799,7 +799,15 @@ static inline int is_vmalloc_or_module_addr(const void *x)
}
#endif
-extern void *kvmalloc_node(size_t size, gfp_t flags, int node);
+extern void *__kvmalloc_node(size_t size, gfp_t flags, int node);
+static inline void *kvmalloc_node(size_t size, gfp_t flags, int node)
+{
+ /* Don't even allow crazy sizes */
+ if (WARN_ON(size > INT_MAX))
+ return NULL;
+ return __kvmalloc_node(size, flags, node);
+}
+
static inline void *kvmalloc(size_t size, gfp_t flags)
{
return kvmalloc_node(size, flags, NUMA_NO_NODE);
@@ -813,14 +821,31 @@ static inline void *kvzalloc(size_t size, gfp_t flags)
return kvmalloc(size, flags | __GFP_ZERO);
}
-static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
+static inline void *__kvmalloc_array(size_t n, size_t size, gfp_t flags)
{
size_t bytes;
if (unlikely(check_mul_overflow(n, size, &bytes)))
return NULL;
- return kvmalloc(bytes, flags);
+ return __kvmalloc_node(bytes, flags, NUMA_NO_NODE);
+}
+
+static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
+{
+ /*
+ * Don't allow crazy sizes here, either. For 64-bit,
+ * this also lets the compiler avoid the overflow check.
+ */
+ if (WARN_ON(size > INT_MAX || n > INT_MAX))
+ return NULL;
+
+ return __kvmalloc_array(n, size, flags);
+}
+
+static inline void *__kvcalloc(size_t n, size_t size, gfp_t flags)
+{
+ return __kvmalloc_array(n, size, flags | __GFP_ZERO);
}
static inline void *kvcalloc(size_t n, size_t size, gfp_t flags)
diff --git a/mm/util.c b/mm/util.c
index 499b6b5767ed..0406709d8097 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -558,7 +558,7 @@ EXPORT_SYMBOL(vm_mmap);
*
* Return: pointer to the allocated memory of %NULL in case of failure
*/
-void *kvmalloc_node(size_t size, gfp_t flags, int node)
+void *__kvmalloc_node(size_t size, gfp_t flags, int node)
{
gfp_t kmalloc_flags = flags;
void *ret;
@@ -593,14 +593,10 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
if (ret || size <= PAGE_SIZE)
return ret;
- /* Don't even allow crazy sizes */
- if (WARN_ON_ONCE(size > INT_MAX))
- return NULL;
-
return __vmalloc_node(size, 1, flags, node,
__builtin_return_address(0));
}
-EXPORT_SYMBOL(kvmalloc_node);
+EXPORT_SYMBOL(__kvmalloc_node);
/**
* kvfree() - Free memory.
So I would also accept a patch that just changes how "kvcalloc()"
works (or how "kvmalloc_array()" works).
It's a bit annoying how we've ended up losing that "n/size"
information by the time we hit kvmalloc().
Linus