On Fri, Feb 22, 2019 at 04:08:59PM -0800, Linus Torvalds wrote: > On Fri, Feb 22, 2019 at 3:56 PM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > It will preserve existing bpf_probe_read() behavior on x86. > > ... but that's the worst possible situation. > > It appears that people haven't understood that kernel and user > addresses are distinct, and may have written programs that are > fundamentally buggy. > > And we _want_ to make it clear that they are buggy on x86-64, exactly > because x86-64 is the one that gets the most testing - by far. > > So if x86-64 continues working for buggy programs, then that only > means that those bugs never get fixed. > > It would be much better to try to get those things fixed, and make the > x86-64 implementation stricter, exactly so that people end up > _realizing_ that they can't just think "a pointer is a pointer, and > the context doesn't matter". > > From a pure functional safety standpoint, I thought bpf already knew > what kind of a pointer it had? when bpf verifier knows the type of pointer it allows direct access to it. That's the case for skb, socket, packet data, hash maps, arrays, stack, etc. Networking progs cannot call bpf_probe_read(). It's available to tracing progs only and their goal is to walk the kernel and user memory with addresses that cannot be statically verified at program load time. We are working on adding type information (BTF) to vmlinux. Soon we'll be able to tell the type of every kernel function argument. Right now arg1 and arg2 in a kprobed function are just u64 pt_regs->di, si. Soon we'll be able to precisely identify their C type. I completely agree on the direction that x86 is the architecture that sets an example and users need to learn the difference in pointers. I only disagree on timing. Right now users don't have an alternative. In our repo I counted ~400 calls to bpf_probe_read and about 10 times more 'indirect' calls. What's happening with 'indirect' calls is BCC toolchain using clang to automatically replace struct_a->field_foo access with bpf_probe_read(struct_a + offsetof(typeof(struct_a), field_foo)); If we had __user vs __kernel annotation available to clang we could have taught BCC to replace this '->' dereference with appropriate kernel vs user helper. Also we need to teach GCC to recognize __user and store into dwarf, so we can take it further into BTF and verify later. Also I think disallowing bpf_probe_read() to read user pointer will not make a desired teaching effect. It will only cause painful debugging to folks when their progs will stop working. It's better to remove bpf_probe_read() completely. imo the process of teaching the users of kernel vs user pointer difference needs to be gradual. First we introduce bpf_probe_kernel_read and bpf_probe_user_read and introduce clang/gcc tooling to catch the mistakes. Going over this 400+ places and manually grepping kernel sources for __user keyword is not a great proposal if we want to keep those users. Once we have this working we can remove bpf_probe_read() altogether. Rejecting bpf prog at load time is a clear signal that user has to fix it (instead of changing run-time behavior). When the verifier gets even smarter it could potentially replace prob_read with probe_kernel_read and probe_user_read when it has that type info. imo this kernel release should finish as-is and in the next cycle we can change probe_kernel_read() to reject user address, have temporary workaround in bpf_probe_read() with probe_kernel_read+get_user hack, introduce new bpf helpers, new tooling and eventually remove buggy bpf_probe_read.