On Fri, 22 Feb 2019 15:56:20 -0800 Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > On Fri, Feb 22, 2019 at 03:16:35PM -0800, Linus Torvalds wrote: > > > > So a kernel pointer value of 0x12345678 could be a value kernel > > pointer pointing to some random kmalloc'ed kernel memory, and a user > > pointer value of 0x12345678 could be a valid _user_ pointer pointing > > to some user mapping. > > > > See? > > > > If you access a user pointer, you need to use a user accessor function > > (eg "get_user()"), while if you access a kernel pointer you need to > > just dereference it directly (unless you can't trust it, in which case > > you need to use a _different_ accessor function). > > that was clear already. > Reading 0x12345678 via probe_kernel_read can return valid value > and via get_user() can return another valid value on _some_ architectures. > > > The fact that user and kernel pointers happen to be distinct on x86-64 > > (right now) is just a random implementation detail. > > yes and my point that people already rely on this implementation detail. > Say we implement > int bpf_probe_read(void *val, void *unsafe_ptr) > { > if (probe_kernel_read(val, unsafe_ptr) == OK) { > return 0; > } else (get_user(val, unsafe_ptr) == OK) { > return 0; > } else { > *val = 0; > return -EFAULT; > } > } Note that we can not use get_user() form kprobe handler. If you use it, you have to prepare fault_handler() and make bpf itself can be aborted. So, maybe you can use probe_user_read(). Hmm, however, it still doesn't work correctly on "some" architecture, since whether a pointer (address) points user-space or kernel-space depends on the context. In kprobe/bpf, the context means where you put the probe and which pointer you record. I think only "__user" tag tells us which one is user-space. But unfortunately, that "__user" tag is only for compiler or checker, not for runtime binary. Such useful attribute goes away when we execute it. So, even if we introduce "ustring", ftrace/perf users has to decide to use it by themselves. As far as I know, DWARF(debuginfo) also doesn't have that attribute. So perf-probe can not help it from debuginfo. (Maybe if we introduce C parser, it might be detected...) > It will preserve existing bpf_probe_read() behavior on x86. > If x86 implementation changes tomorrow then progs that read user > addresses may start failing randomly because first probe_kernel_read() > will be returning random values from kernel memory and that's no good, > but at least we won't be breaking them today, so we have time to > introduce bpf_user_read and bpf_kernel_read and folks have time to adopt them. I see. I think bpf also has to introduce new bpf_probe_read_user() and keep bpf_probe_read() for kernel dataa only. > Imo that's much better than making current bpf_probe_read() fail > on user addresses today and not providing a non disruptive path forward. Agreed. Thank you, -- Masami Hiramatsu <mhiramat@xxxxxxxxxx>