On Thu, Sep 05, 2019 at 07:50:26PM +1000, Aleksa Sarai wrote: > On 2019-09-05, Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx> wrote: > > On 04/09/2019 22.19, Aleksa Sarai wrote: > > > A common pattern for syscall extensions is increasing the size of a > > > struct passed from userspace, such that the zero-value of the new fields > > > result in the old kernel behaviour (allowing for a mix of userspace and > > > kernel vintages to operate on one another in most cases). This is done > > > in both directions -- hence two helpers -- though it's more common to > > > have to copy user space structs into kernel space. > > > > > > Previously there was no common lib/ function that implemented > > > the necessary extension-checking semantics (and different syscalls > > > implemented them slightly differently or incompletely[1]). A future > > > patch replaces all of the common uses of this pattern to use the new > > > copy_struct_{to,from}_user() helpers. > > > > > > [1]: For instance {sched_setattr,perf_event_open,clone3}(2) all do do > > > similar checks to copy_struct_from_user() while rt_sigprocmask(2) > > > always rejects differently-sized struct arguments. > > > > > > Suggested-by: Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx> > > > Signed-off-by: Aleksa Sarai <cyphar@xxxxxxxxxx> > > > --- > > > diff --git a/lib/struct_user.c b/lib/struct_user.c > > > new file mode 100644 > > > index 000000000000..7301ab1bbe98 > > > --- /dev/null > > > +++ b/lib/struct_user.c > > > @@ -0,0 +1,182 @@ > > > +// SPDX-License-Identifier: GPL-2.0-or-later > > > +/* > > > + * Copyright (C) 2019 SUSE LLC > > > + * Copyright (C) 2019 Aleksa Sarai <cyphar@xxxxxxxxxx> > > > + */ > > > + > > > +#include <linux/types.h> > > > +#include <linux/export.h> > > > +#include <linux/uaccess.h> > > > +#include <linux/kernel.h> > > > +#include <linux/string.h> > > > + > > > +#define BUFFER_SIZE 64 > > > + > > > +/* > > > + * "memset(p, 0, size)" but for user space buffers. Caller must have already > > > + * checked access_ok(p, size). > > > + */ > > > > Isn't this __clear_user() exactly (perhaps except for the return value)? > > Perhaps not every arch has that? > > I didn't know about clear_user() -- I will switch to it. > > > > +static int __memzero_user(void __user *p, size_t s) > > > +{ > > > + const char zeros[BUFFER_SIZE] = {}; > > > + while (s > 0) { > > > + size_t n = min(s, sizeof(zeros)); > > > + > > > + if (__copy_to_user(p, zeros, n)) > > > + return -EFAULT; > > > + > > > + p += n; > > > + s -= n; > > > + } > > > + return 0; > > > +} > > > + > > > +/** > > > + * copy_struct_to_user: copy a struct to user space > > > + * @dst: Destination address, in user space. > > > + * @usize: Size of @dst struct. > > > + * @src: Source address, in kernel space. > > > + * @ksize: Size of @src struct. > > > + * > > > + * Returns (in all cases, some data may have been copied): > > > + * * -EFBIG: (@usize < @ksize) and there are non-zero trailing bytes in @src. > > > + * * -EFAULT: access to user space failed. > > > + */ > > > +int copy_struct_to_user(void __user *dst, size_t usize, > > > + const void *src, size_t ksize) > > > +{ > > > + size_t size = min(ksize, usize); > > > + size_t rest = abs(ksize - usize); > > > > Eh, I'd avoid abs() here due to the funkiness of the implicit type > > conversions - ksize-usize has type size_t, then that's coerced to an int > > (or a long maybe?), the abs is applied which return an int/long (or > > unsigned versions?). Something like "rest = max(ksize, usize) - size;" > > is more obviously correct and doesn't fall into any > > narrowing/widening/sign extending traps. > > Yeah, I originally used "max(ksize, usize) - size" for that reason but > was worried it looked too funky (and some quick tests showed that abs() > gives the right results in most cases -- though I just realised it would > probably not give the right results around SIZE_MAX). I'll switch back. > > > > + if (unlikely(usize > PAGE_SIZE)) > > > + return -EFAULT; > > > > Please don't. That is a restriction on all future extensions - once a > > kernel is shipped with a syscall using this helper with that arbitrary > > restriction in place, that syscall is forever prevented from extending > > its arg struct beyond PAGE_SIZE (which is arch-dependent anyway). Sure, > > it's hard to imagine, but who'd have thought 32 O_* or CLONE_* bits > > weren't enough for everybody? > > > > This is only for future compatibility, and if someone runs an app > > compiled against 7.3 headers on a 5.4 kernel, they probably don't care > > about performance, but they would like their app to run. > > I'm not sure I agree that the limit is in place *forever* -- it's > generally not a break in compatibility to convert an error into a > success (though, there are counterexamples such as mknod(2) -- but that > was a very specific case). > > You're right that it would mean that some very new code won't run on > very ancient kernels (assuming we ever pass around structs that > massive), but there should be a reasonable trade-off here IMHO. Passing a struct larger than a PAGE_SIZE right now (at least for all those calls that would make use of this helper at the moment) is to be considered a bug. The PAGE_SIZE check is a reasonable heuristic. It's an assumption that is pretty common in the kernel in other places as well. Plus the possibility of DoS. > > If we allow very large sizes, a program could probably DoS the kernel by > allocating a moderately-large block of memory and then spawning a bunch > of threads that all cause the kernel to re-check that the same 1GB block > of memory is zeroed. I haven't tried, but it seems like it's best to > avoid the possibility altogether. > > > > + } > > > + /* Copy the interoperable parts of the struct. */ > > > + if (__copy_to_user(dst, src, size)) > > > + return -EFAULT; > > > > I think I understand why you put this last instead of handling the > > buffer in the "natural" order. However, > > I'm wondering whether we should actually do this copy before checking > > that the extra kernel bytes are 0 - the user will still be told that > > there was some extra information via the -EFBIG/-E2BIG return, but maybe > > in some cases the part he understands is good enough. But I also guess > > we have to look to existing users to see whether that would prevent them > > from being converted to using this helper. > > > > linux-api folks, WDYT? > > Regarding the order, I just copied what sched and perf already do. I > wouldn't mind doing it the other way around -- though I am a little > cautious about implicitly making guarantees like that. The syscall that > uses copy_struct_to_user() might not want to make that guarantee (it > might not make sense for them), and there are some -E2BIG returns that > won't result in data being copied (usize > PAGE_SIZE). > > As for feedback, this is syscall-dependent at the moment. The sched and > perf users explicitly return the size of the kernel structure (by > overwriting uattr->size if -E2BIG is returned) for copies in either > direction. So users arguably already have some kind of feedback about > size issues. clone3() on the other hand doesn't do that (though it > doesn't copy anything to user-space so this isn't relevant to this > particular question). > > Effectively, I'd like to see someone argue that this is something that > they would personally want (before we do it). I think the order you have right now is fine. I don't see the point of doing work first before we have verified that things are sane. _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers