On Fri, Aug 16, 2019 at 10:28:29PM +0200, Thomas Gleixner wrote: > Alexei, > > On Fri, 16 Aug 2019, Alexei Starovoitov wrote: > > It's both of the above when 'systemd' is not taken literally. > > To earlier Thomas's point: the use case is not only about systemd. > > There are other containers management systems. > > <SNIP> > > > These daemons need to drop privileges to make the system safer == less > > prone to corruption due to bugs in themselves. Not necessary security > > bugs. > > Let's take a step back. > > While real usecases are helpful to understand a design decision, the design > needs to be usecase independent. > > The kernel provides mechanisms, not policies. My impression of this whole > discussion is that it is policy driven. That's the wrong approach. not sure what you mean by 'policy driven'. Proposed CAP_BPF is a policy? My desire to do kernel.unprivileged_bpf_disabled=1 is driven by text in Documentation/x86/mds.rst which says: "There is one exception, which is untrusted BPF. The functionality of untrusted BPF is limited, but it needs to be thoroughly investigated whether it can be used to create such a construct." commit 6a9e52927251 ("x86/speculation/mds: Add mds_clear_cpu_buffers()") Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Reviewed-by: Borislav Petkov <bp@xxxxxxx> Reviewed-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> Reviewed-by: Frederic Weisbecker <frederic@xxxxxxxxxx> Reviewed-by: Jon Masters <jcm@xxxxxxxxxx> Tested-by: Jon Masters <jcm@xxxxxxxxxx> The way I read this text: - there is a concern that mds is exploitable via bpf - there is a desire to investigate to address this concern I'm committed to help with the investigation. In the mean time I propose a path to do kernel.unprivileged_bpf_disabled=1 which is CAP_BPF. Can kernel.unprivileged_bpf_disabled=1 be used now? Yes, but it will weaken overall system security because things that use unpriv to load bpf and CAP_NET_ADMIN to attach bpf would need to move to stronger CAP_SYS_ADMIN. With CAP_BPF both load and attach would happen under CAP_BPF instead of CAP_SYS_ADMIN. > So let's look at the mechanisms which we have at hand: > > 1) Capabilities > > 2) SUID and dropping priviledges > > 3) Seccomp and LSM > > Now the real interesting questions are: > > A) What kind of restrictions does BPF allow? Is it a binary on/off or is > there a more finegrained control of BPF functionality? > > TBH, I can't tell. > > B) Depending on the answer to #A what is the control possibility for > #1/#2/#3 ? Can any of the mechanisms 1/2/3 address the concern in mds.rst? I believe Andy wants to expand the attack surface when kernel.unprivileged_bpf_disabled=0 Before that happens I'd like the community to work on addressing the text above.