On Fri, 22 Feb 2019 09:43:14 -0800 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Fri, Feb 22, 2019 at 12:35 AM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote: > > > > Or, can we do this? > > > > long __probe_user_read(void *dst, const void *src, size_t size) > > { > > Add a > > if (!access_ok(src, size)) > ret = -EFAULT; > else { > .. do the pagefault_disable() etc .. > } Since kprobes handler runs in IRQ context, we can not use access_ok() in it. (only on x86 + CONFIG_DEBUG_ATOMIC_SLEEP=y) In arch/x86/include/asm/uaccess.h: #define access_ok(addr, size) \ ({ \ WARN_ON_IN_IRQ(); \ likely(!__range_not_ok(addr, size, user_addr_max())); \ }) Do we need acccess_ok_inatomic()? BTW, it seems a bit strange that this WARN_ON_IN_IRQ() is only in x86 access_ok() implementation, since CONFIG_DEBUG_ATOMIC_SLEEP(which defines WARN_ON_IN_IRQ) doesn't depend on x86, and access_ok() is widely used in kernel. I think it would be better that each arch provides __access_ok() and include/linux/uaccess.h provides access_ok() with WARN_ON_IN_IRQ(). > to after the "set_fs()", and it looks good to me. Make it clear that > yes, this works _only_ for user reads. > > Adn that makes all the games with "kernel_uaccess_faults_ok" > pointless, so you can just remove them. OK. > > (note that the "access_ok()" has to come after we've done "set_fs()", > because it takes the address limit from that). > > Also, since normally we'd expect that we already have USER_DS, it > might be worthwhile to do this with a wrapper, something along the > lines of > > mm_segment_t old_fs = get_fs(); > > if (segment_eq(old_fs, USER_DS)) > return __normal_probe_user_read(); > set_fs(USER_DS); > ret = __normal_probe_user_read(); > set_fs(old_fs); > return ret; > > and have that __normal_probe_user_read() just do > > if (!access_ok(src, size)) > return -EFAULT; > pagefault_disable(); > ret = __copy_from_user_inatomic(dst, ...); > pagefault_enable(); > return ret ? -EFAULT : 0; > > which looks more obvious. OK. > > Also, I would suggest that you just make the argument type be "const > void __user *", since the whole point is that this takes a user > pointer, and nothing else. Ah, right. > Then we should still probably fix up "__probe_kernel_read()" to not > allow user accesses. The easiest way to do that is actually likely to > use the "unsafe_get_user()" functions *without* doing a > uaccess_begin(), which will mean that modern CPU's will simply fault > on a kernel access to user space. Or, use __chk_user_ptr(ptr) to check it? Thank you, > > The nice thing about that is that usually developers will have access > to exactly those modern boxes, so the people who notice that it > doesn't work are the right people. > > Alternatively, we should just make it be architecture-specific, so > that architectures can decide "this address cannot be a kernel > address" and refuse to do it. > > Linus -- Masami Hiramatsu <mhiramat@xxxxxxxxxx>