On Tue, Aug 27, 2019 at 9:43 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Tue, Aug 27, 2019 at 05:55:41PM -0700, Andy Lutomirski wrote: > > > > I was hoping for something in Documentation/admin-guide, not in a > > changelog that's hard to find. > > eventually yes. > > > > > > > > Changing the capability that some existing operation requires could > > > > break existing programs. The old capability may need to be accepted > > > > as well. > > > > > > As far as I can see there is no ABI breakage. Please point out > > > which line of the patch may break it. > > > > As a more or less arbitrary selection: > > > > void bpf_prog_kallsyms_add(struct bpf_prog *fp) > > { > > if (!bpf_prog_kallsyms_candidate(fp) || > > - !capable(CAP_SYS_ADMIN)) > > + !capable(CAP_BPF)) > > return; > > > > Before your patch, a task with CAP_SYS_ADMIN could do this. Now it > > can't. Per the usual Linux definition of "ABI break", this is an ABI > > break if and only if someone actually did this in a context where they > > have CAP_SYS_ADMIN but not all capabilities. How confident are you > > that no one does things like this? > > void bpf_prog_kallsyms_add(struct bpf_prog *fp) > > { > > if (!bpf_prog_kallsyms_candidate(fp) || > > - !capable(CAP_SYS_ADMIN)) > > + !capable(CAP_BPF)) > > return; > > Yes. I'm confident that apps don't drop everything and > leave cap_sys_admin only before doing bpf() syscall, since it would > break their own use of networking. > Hence I'm not going to do the cap_syslog-like "deprecated" message mess > because of this unfounded concern. > If I turn out to be wrong we will add this "deprecated mess" later. > > > > > From the previous discussion, you want to make progress toward solving > > a lot of problems with CAP_BPF. One of them was making BPF > > firewalling more generally useful. By making CAP_BPF grant the ability > > to read kernel memory, you will make administrators much more nervous > > to grant CAP_BPF. > > Andy, were your email hacked? > I explained several times that in this proposal > CAP_BPF _and_ CAP_TRACING _both_ are necessary to read kernel memory. > CAP_BPF alone is _not enough_. You have indeed said this many times. You've stated it as a matter of fact as though it cannot possibly discussed. I'm asking you to justify it. > > Similarly, and correct me if I'm wrong, most of > > these capabilities are primarily or only useful for tracing, so I > > don't see why users without CAP_TRACING should get them. > > bpf_trace_printk(), in particular, even has "trace" in its name :) > > > > Also, if a task has CAP_TRACING, it's expected to be able to trace the > > system -- that's the whole point. Why shouldn't it be able to use BPF > > to trace the system better? > > CAP_TRACING shouldn't be able to do BPF because BPF is not tracing only. What does "do BPF" even mean? seccomp() does BPF. SO_ATTACH_FILTER does BPF. Saying that using BPF should require a specific capability seems kind of like saying that using the network should require a specific capability. Linux (and Unixy systems in general) distinguish between binding low-number ports, binding high-number ports, using raw sockets, and changing the system's IP address. These have different implications and require different capabilities. It seems like you are specifically trying to add a new switch to turn as much of BPF as possible on and off. Why? > > > > test_run allows fully controlled inputs, in a context where a program > > can trivially flush caches, mistrain branch predictors, etc first. It > > seems to me that, if a JITted bpf program contains an exploitable > > speculation gadget (MDS, Spectre v1, RSB, or anything else), > > speaking of MDS... I already asked you to help investigate its > applicability with existing bpf exposure. Are you going to do that? I am blissfully uninvolved in MDS, and I don't know all that much more about the overall mechanism than a random reader of tech news :) ISTM there are two meaningful ways that BPF could be involved: a BPF program could leak info into the state exposed by MDS, or a BPF program could try to read that state. From what little I understand, it's essentially inevitable that BPF leaks information into MDS state, and this is probably even controllable by an attacker that understands MDS in enough detail. So the interesting questions are: can BPF be used to read MDS state and can BPF be used to leak information in a more useful way than the rest of the kernel to an attacker. Keeping in mind that the kernel will flush MDS state on every exit to usermode, I think the most likely attack is to try to read MDS state with BPF. This could happen, I suppose -- BPF programs can easily contain the usual speculation gadgets of "do something and read an address that depends on the outcome". Fortunately, outside of bpf_probe_read(), AFAIK BPF programs can't directly touch user memory, and an attacker that is allowed to use bpf_probe_read() doesn't need MDS to read things. So it's not entirely obvious to me how an attack would be mounted. test_run would make it a lot easier, I think. > > > it will > > be *much* easier to exploit it using test_run than using normal > > network traffic. Similarly, normal network traffic will have network > > headers that are valid enough to have caused the BPF program to be > > invoked in the first place. test_run can inject arbitrary garbage. > > Please take a look at Jann's var1 exploit. Was it hard to run bpf prog > in controlled environment without test_run command ? > Can you send me a link?