On Sun, Oct 13, 2019 at 12:22:38PM -0700, Linus Torvalds wrote: > On Sun, Oct 13, 2019 at 12:10 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > No arguments re put_user_ex side of things... Below is a completely > > untested patch for get_user_ex elimination (it seems to build, but that's > > it); in any case, I would really like to see comments from x86 folks > > before it goes anywhere. > > Please don't do this: > > > + if (unlikely(__copy_from_user(&sc, usc, sizeof(sc)))) > > + goto Efault; > > Why would you use __copy_from_user()? Just don't. > > > + if (unlikely(__copy_from_user(&v, user_vm86, > > + offsetof(struct vm86_struct, int_revectored)))) > > Same here. > > There's no excuse for __copy_from_user(). FWIW, callers of __copy_from_user() remaining in the generic code: 1) regset.h:user_regset_copyin(). Switch to copy_from_user(); the calling conventions of regset ->set() (as well as the method name) are atrocious, but there are too many instances to mix any work in that direction into this series. Yes, nominally it's an inline, but IRL it's too large and has many callers in the same file(s), so any optimizations of inlining __copy_from_user() will be lost and there's more than enough work done there to make access_ok() a noise. And in this case it doesn't pay to try and lift user_access_begin() into the callers - the work done between the calls is often too non-trivial to be done in such area. The same goes for other regset.h stuff; eventually we might want to try and come up with saner API, but that's a separate story. 2) default csum_partial_copy_from_user(). What we need to do is turn it into default csum_and_copy_from_user(). This #ifndef _HAVE_ARCH_COPY_AND_CSUM_FROM_USER static inline __wsum csum_and_copy_from_user (const void __user *src, void *dst, int len, __wsum sum, int *err_ptr) { if (access_ok(src, len)) return csum_partial_copy_from_user(src, dst, len, sum, err_ptr); if (len) *err_ptr = -EFAULT; return sum; } #endif in checksum.h is the only thing that calls that sucker and we can bloody well combine them and make the users of lib/checksum.h define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER. That puts us reasonably close to having _HAVE_ARCH_COPY_AND_CSUM_FROM_USER unconditional and in any case, __copy_from_user() in lib/checksum.h turns into copy_from_user(). 3) firewire ioctl_queue_iso(). Convert to copy_from_user(), lose the access_ok() before the loop. Definitely not an unsafe_... situation (we call fw_iso_context_queue() after each chunk; _not_ something we want under user_access_begin()/user_access_end()) and it's really not worth trying to save on access_ok() checks there. 4) pstore persistent_ram_update_user(). Obvious copy_from_user(); definitely lose access_ok() in the caller (persistent_ram_write_user()), along with the one in write_pmsg() (several calls back by the callchain). 5) test_kasan: lose the function, lose the tests... 6) drivers/scsi/sg.c nest: sg_read() ones are memdup_user() in disguise (i.e. fold with immediately preceding kmalloc()s). sg_new_write() - fold with access_ok() into copy_from_user() (for both call sites). sg_write() - lose access_ok(), use copy_from_user() (both call sites) and get_user() (instead of the solitary __get_user() there). 7) i915 ones are, frankly, terrifying. Consider e.g. this one: relocs = kvmalloc_array(size, 1, GFP_KERNEL); if (!relocs) { err = -ENOMEM; goto err; } /* copy_from_user is limited to < 4GiB */ copied = 0; do { unsigned int len = min_t(u64, BIT_ULL(31), size - copied); if (__copy_from_user((char *)relocs + copied, (char __user *)urelocs + copied, len)) goto end; copied += len; } while (copied < size); Is that for real? Are they *really* trying to allocate and copy >2Gb of userland data? That's eb_copy_relocations() and that crap is itself in a loop. Sizes come from user-supplied data. WTF? It's some weird kmemdup lookalike and I'd rather heard from maintainers of that thing before doing anything with it. 8) vhost_copy_from_user(). Need comments from mst - it's been a while since I crawled through that code and I'd need his ACK anyway. The logics with positioning of access_ok() in there is non-trivial and I'm not sure how much of that serves as early input validation and how much can be taken out and replaced by use of place copy_from_user() and friends. 9) KVM. There I'm not sure that access_ok() would be the right thing to do. kvm_is_error_hva() tends to serve as the range check in that and similar places; it's not the same situation as with NMI, but... And that's it - everything else is in arch/*. Looking at arch/x86, we have * insanity in math_emu (unchecked return value, for example) * a bunch sigframe-related code. Some want to use unsafe_... (or raw_...) variant, some should probably go for copy_from_user(). FPU-related stuff is particularly interesting in that respect - there we have several inline functions nearby that contain nothing but stac + instruction + clac + exception handling. And in quite a few cases it would've been cleaner to lift stac/clac into the callers, since they combine nicely. * regset_tls_set(): use copy_from_user(). * one in kvm walk_addr_generic stuff. If nothing else, that one smells like __get_user() - we seem to be copying a single PTE. And again, it's using kvm_is_error_hva().