On Thu, Sep 05, 2019 at 07:50:26PM +1000, Aleksa Sarai wrote:
On 2019-09-05, Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx> wrote:
On 04/09/2019 22.19, Aleksa Sarai wrote:
A common pattern for syscall extensions is increasing the size of a
struct passed from userspace, such that the zero-value of the new fields
result in the old kernel behaviour (allowing for a mix of userspace and
kernel vintages to operate on one another in most cases). This is done
in both directions -- hence two helpers -- though it's more common to
have to copy user space structs into kernel space.
Previously there was no common lib/ function that implemented
the necessary extension-checking semantics (and different syscalls
implemented them slightly differently or incompletely[1]). A future
patch replaces all of the common uses of this pattern to use the new
copy_struct_{to,from}_user() helpers.
[1]: For instance {sched_setattr,perf_event_open,clone3}(2) all do do
similar checks to copy_struct_from_user() while rt_sigprocmask(2)
always rejects differently-sized struct arguments.
Suggested-by: Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx>
Signed-off-by: Aleksa Sarai <cyphar@xxxxxxxxxx>
---
diff --git a/lib/struct_user.c b/lib/struct_user.c
new file mode 100644
index 000000000000..7301ab1bbe98
--- /dev/null
+++ b/lib/struct_user.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2019 SUSE LLC
+ * Copyright (C) 2019 Aleksa Sarai <cyphar@xxxxxxxxxx>
+ */
+
+#include <linux/types.h>
+#include <linux/export.h>
+#include <linux/uaccess.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+
+#define BUFFER_SIZE 64
+
+/*
+ * "memset(p, 0, size)" but for user space buffers. Caller must have already
+ * checked access_ok(p, size).
+ */
Isn't this __clear_user() exactly (perhaps except for the return value)?
Perhaps not every arch has that?
I didn't know about clear_user() -- I will switch to it.
+static int __memzero_user(void __user *p, size_t s)
+{
+ const char zeros[BUFFER_SIZE] = {};
+ while (s > 0) {
+ size_t n = min(s, sizeof(zeros));
+
+ if (__copy_to_user(p, zeros, n))
+ return -EFAULT;
+
+ p += n;
+ s -= n;
+ }
+ return 0;
+}
+
+/**
+ * copy_struct_to_user: copy a struct to user space
+ * @dst: Destination address, in user space.
+ * @usize: Size of @dst struct.
+ * @src: Source address, in kernel space.
+ * @ksize: Size of @src struct.
+ *
+ * Returns (in all cases, some data may have been copied):
+ * * -EFBIG: (@usize < @ksize) and there are non-zero trailing bytes in @src.
+ * * -EFAULT: access to user space failed.
+ */
+int copy_struct_to_user(void __user *dst, size_t usize,
+ const void *src, size_t ksize)
+{
+ size_t size = min(ksize, usize);
+ size_t rest = abs(ksize - usize);
Eh, I'd avoid abs() here due to the funkiness of the implicit type
conversions - ksize-usize has type size_t, then that's coerced to an int
(or a long maybe?), the abs is applied which return an int/long (or
unsigned versions?). Something like "rest = max(ksize, usize) - size;"
is more obviously correct and doesn't fall into any
narrowing/widening/sign extending traps.
Yeah, I originally used "max(ksize, usize) - size" for that reason but
was worried it looked too funky (and some quick tests showed that abs()
gives the right results in most cases -- though I just realised it would
probably not give the right results around SIZE_MAX). I'll switch back.
+ if (unlikely(usize > PAGE_SIZE))
+ return -EFAULT;
Please don't. That is a restriction on all future extensions - once a
kernel is shipped with a syscall using this helper with that arbitrary
restriction in place, that syscall is forever prevented from extending
its arg struct beyond PAGE_SIZE (which is arch-dependent anyway). Sure,
it's hard to imagine, but who'd have thought 32 O_* or CLONE_* bits
weren't enough for everybody?
This is only for future compatibility, and if someone runs an app
compiled against 7.3 headers on a 5.4 kernel, they probably don't care
about performance, but they would like their app to run.
I'm not sure I agree that the limit is in place *forever* -- it's
generally not a break in compatibility to convert an error into a
success (though, there are counterexamples such as mknod(2) -- but that
was a very specific case).
You're right that it would mean that some very new code won't run on
very ancient kernels (assuming we ever pass around structs that
massive), but there should be a reasonable trade-off here IMHO.
Passing a struct larger than a PAGE_SIZE right now (at least for all
those calls that would make use of this helper at the moment) is to be
considered a bug.
The PAGE_SIZE check is a reasonable heuristic. It's an assumption that
is pretty common in the kernel in other places as well. Plus the
possibility of DoS.
If we allow very large sizes, a program could probably DoS the kernel by
allocating a moderately-large block of memory and then spawning a bunch
of threads that all cause the kernel to re-check that the same 1GB block
of memory is zeroed. I haven't tried, but it seems like it's best to
avoid the possibility altogether.
+ }
+ /* Copy the interoperable parts of the struct. */
+ if (__copy_to_user(dst, src, size))
+ return -EFAULT;
I think I understand why you put this last instead of handling the
buffer in the "natural" order. However,
I'm wondering whether we should actually do this copy before checking
that the extra kernel bytes are 0 - the user will still be told that
there was some extra information via the -EFBIG/-E2BIG return, but maybe
in some cases the part he understands is good enough. But I also guess
we have to look to existing users to see whether that would prevent them
from being converted to using this helper.
linux-api folks, WDYT?
Regarding the order, I just copied what sched and perf already do. I
wouldn't mind doing it the other way around -- though I am a little
cautious about implicitly making guarantees like that. The syscall that
uses copy_struct_to_user() might not want to make that guarantee (it
might not make sense for them), and there are some -E2BIG returns that
won't result in data being copied (usize > PAGE_SIZE).
As for feedback, this is syscall-dependent at the moment. The sched and
perf users explicitly return the size of the kernel structure (by
overwriting uattr->size if -E2BIG is returned) for copies in either
direction. So users arguably already have some kind of feedback about
size issues. clone3() on the other hand doesn't do that (though it
doesn't copy anything to user-space so this isn't relevant to this
particular question).
Effectively, I'd like to see someone argue that this is something that
they would personally want (before we do it).
I think the order you have right now is fine. I don't see the point of
doing work first before we have verified that things are sane.