> On Aug 28, 2019, at 3:55 PM, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > >> On Tue, Aug 27, 2019 at 11:12:29PM -0700, Andy Lutomirski wrote: >>>> >>>> >>>> From the previous discussion, you want to make progress toward solving >>>> a lot of problems with CAP_BPF. One of them was making BPF >>>> firewalling more generally useful. By making CAP_BPF grant the ability >>>> to read kernel memory, you will make administrators much more nervous >>>> to grant CAP_BPF. >>> >>> Andy, were your email hacked? >>> I explained several times that in this proposal >>> CAP_BPF _and_ CAP_TRACING _both_ are necessary to read kernel memory. >>> CAP_BPF alone is _not enough_. >> >> You have indeed said this many times. You've stated it as a matter of >> fact as though it cannot possibly discussed. I'm asking you to >> justify it. > > That's not how I see it. > I kept stating that both CAP_BPF and CAP_TRACING are necessary to read > kernel memory whereas you kept distorting my statement by dropping second > part and then making claims that "CAP_BPF grant the ability to read > kernel memory, you will make administrators much more nervous". Mea culpa. CAP_BPF does, however, appear to permit breaking kASLR due to unsafe pointer conversions, and it allows reading and writing everyone’s maps. I stand by my overall point. > > Just s/CAP_BPF/CAP_BPF and CAP_TRACING/ in this above sentence. > See that meaning suddenly changes? > Now administrators would be worried about tasks that have both at once. > They also would be worried about tasks that have CAP_TRACING alone, > because that's what allows probe_kernel_read(). This is not all what I meant. Of course granting CAP_BPF+CAP_TRACING allows reading kernel memory. This is not at all a problem. Here is a problem I see: CAP_TRACING + CAP_BPF allows modification of other people’s maps and potentially other things that should not be implied by CAP_TRACING alone and that don’t need to be available to tracers. So CAP_TRACING, which is powerful but has somewhat limited scope, isn’t fully useful without CAP_BPF, and giving CAP_TRACING *and* CAP_BPF allows things that teachers shouldn’t be able to do. I think this would make the whole mechanism less useful to Android, for example. (Also, I’m not sure quite what you mean by “CAP_TRACING ... allows probe_kernel_read()”. probe_kernel_read() is a kernel function that can’t be directly called by userspace. CAP_TRACING allows reading kernel memory in plenty of ways regardless.) > >> It seems like you are specifically trying to add a new switch to turn >> as much of BPF as possible on and off. Why? > > Didn't I explain it several times already with multiple examples > from systemd, daemons, bpftrace ? > > Let's try again. > Take your laptop with linux distro. > You're the only user there. I'm assuming you're not sharing it with > partner and kids. This is my definition of 'single user system'. > You can sudo on it at any time, but obviously prefer to run as many > apps as possible without cap_sys_admin. > Now you found some awesome open source app on the web that monitors > the health of the kernel and will pop a nice message on a screen if > something is wrong. Currently this app needs root. You hesitate, > but the apps is so useful and it has strong upstream code review process > that you keep running it 24/7. > This is open source app. New versions come. You upgrade. > You have enough trust in that app that you keep running it as root. > But there is always a chance that new version doing accidentaly > something stupid as 'kill -9 -1'. It's an open source app at the end. > > Now I come with this CAP* proposal to make this app safer. > I'm not making your system more secure and not making this app > more secure. I can only make your laptop safer for day to day work > by limiting the operations this app can do. > This particular app monitros the kernel via bpf and tracing. > Hence you can give it CAP_TRACING and CAP_BPF and drop the rest. This won’t make me much more comfortable, since CAP_BPF lets it do an ever-growing set of nasty things. I’d much rather one or both of two things happen: 1. Give it CAP_TRACING only. It can leak my data, but it’s rather hard for it to crash my laptop, lose data, or cause other shenanigans. 2. Improve it a bit do all the privileged ops are wrapped by capset(). Does this make sense? I’m a security person on occasion. I find vulnerabilities and exploit them deliberately and I break things by accident on a regular basis. In my considered opinion, CAP_TRACING alone, even extended to cover part of BPF as I’ve described, is decently safe. Getting root with just CAP_TRACING will be decently challenging, especially if I don’t get to read things like sshd’s memory, and improvements to mitigate even that could be added. I am quite confident that attacks starting with CAP_TRACING will have clear audit signatures if auditing is on. I am also confident that CAP_BPF *will* allow DoS and likely privilege escalation, and this will only get more likely as BPF gets more widely used. And, if BPF-based auditing ever becomes a thing, writing to the audit daemon’s maps will be a great way to cover one’s tracks. > I think they have no choice but to do kernel.unprivileged_bpf_disabled=1. > We, as a kernel community, are forcing the users into it. > Hence I really do not see a value in any proposal today that expands > unprivileged bpf usage. I think you’re overemphasizing bpf’s role in the whole speculation mess. I realize that you’ve spent an insane amount of time on mitigations to stupid issues. I’ve spent a less insane amount of time on mitigating similar issues outside of bpf. It’s a mess. At the end of the day, the kernel does its best, and new bugs show up. New CPUs will be less buggy.