On Sun, May 16, 2021 at 1:39 AM Tianyin Xu <tyxu@xxxxxxxxxxxx> wrote: > > On Sat, May 15, 2021 at 10:49 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > > > On 5/10/21 10:21 PM, YiFei Zhu wrote: > > > On Mon, May 10, 2021 at 12:47 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > >> On Mon, May 10, 2021 at 10:22 AM YiFei Zhu <zhuyifei1999@xxxxxxxxx> wrote: > > >>> > > >>> From: YiFei Zhu <yifeifz2@xxxxxxxxxxxx> > > >>> > > >>> Based on: https://urldefense.com/v3/__https://lists.linux-foundation.org/pipermail/containers/2018-February/038571.html__;!!DZ3fjg!thbAoRgmCeWjlv0qPDndNZW1j6Y2Kl_huVyUffr4wVbISf-aUiULaWHwkKJrNJyo$ > > >>> > > >>> This patchset enables seccomp filters to be written in eBPF. > > >>> Supporting eBPF filters has been proposed a few times in the past. > > >>> The main concerns were (1) use cases and (2) security. We have > > >>> identified many use cases that can benefit from advanced eBPF > > >>> filters, such as: > > >> > > >> I haven't reviewed this carefully, but I think we need to distinguish > > >> a few things: > > >> > > >> 1. Using the eBPF *language*. > > >> > > >> 2. Allowing the use of stateful / non-pure eBPF features. > > >> > > >> 3. Allowing the eBPF programs to read the target process' memory. > > >> > > >> I'm generally in favor of (1). I'm not at all sure about (2), and I'm > > >> even less convinced by (3). > > >> > > >>> > > >>> * exec-only-once filter / apply filter after exec > > >> > > >> This is (2). I'm not sure it's a good idea. > > > > > > The basic idea is that for a container runtime it may wait to execute > > > a program in a container without that program being able to execve > > > another program, stopping any attack that involves loading another > > > binary. The container runtime can block any syscall but execve in the > > > exec-ed process by using only cBPF. > > > > > > The use case is suggested by Andrea Arcangeli and Giuseppe Scrivano. > > > @Andrea and @Giuseppe, could you clarify more in case I missed > > > something? > > > > We've discussed having a notifier-using filter be able to replace its > > filter. This would allow this and other use cases without any > > additional eBPF or cBPF code. > > > > A notifier is not always a solution (even ignoring its perf overhead). > > One problem, pointed out by Andrea Arcangeli, is that notifiers need > userspace daemons. So, it can hardly be used by daemonless container > engines like Podman. > > And, /* sorry for repeating.. */ the performance overhead of notifiers > is not close to ebpf, which prevents use cases that require native > performance. While I agree with you that this is the case right now, there's no reason it has to be the case. There's a variety of mechanisms that can be employed to significantly speed up the performance of the notifier. For example, right now the notifier is behind one large per-filter lock. That could be removed allowing for better concurrency. There are a large number of mechanisms that scale O(n) with the outstanding notifications -- again, something that could be improved. The other big improvement that could be made is being able to use something like io_uring with the notifier interface, but it would require a fairly significant user API change -- and a move away from ioctl. I'm not sure if people are excited about that idea at the moment. > > > > >> eBPF doesn't really have a privilege model yet. There was a long and > > >> disappointing thread about this awhile back. > > > > > > The idea is that “seccomp-eBPF does not make life easier for an > > > adversary”. Any attack an adversary could potentially utilize > > > seccomp-eBPF, they can do the same with other eBPF features, i.e. it > > > would be an issue with eBPF in general rather than specifically > > > seccomp’s use of eBPF. > > > > > > Here it is referring to the helpers goes to the base > > > bpf_base_func_proto if the caller is unprivileged (!bpf_capable || > > > !perfmon_capable). In this case, if the adversary would utilize eBPF > > > helpers to perform an attack, they could do it via another > > > unprivileged prog type. > > > > > > That said, there are a few additional helpers this patchset is adding: > > > * get_current_uid_gid > > > * get_current_pid_tgid > > > These two provide public information (are namespaces a concern?). I > > > have no idea what kind of exploit it could add unless the adversary > > > somehow side-channels the task_struct? But in that case, how is the > > > reading of task_struct different from how the rest of the kernel is > > > reading task_struct? > > > > Yes, namespaces are a concern. This idea got mostly shot down for kdbus > > (what ever happened to that?), and it likely has the same problems for > > seccomp. > > > > >> > > >> What is this for? > > > > > > Memory reading opens up lots of use cases. For example, logging what > > > files are being opened without imposing too much performance penalty > > > from strace. Or as an accelerator for user notify emulation, where > > > syscalls can be rejected on a fast path if we know the memory contents > > > does not satisfy certain conditions that user notify will check. > > > > > > > This has all kinds of race conditions. > > > > > > I hate to be a party pooper, but this patchset is going to very high bar > > to acceptance. Right now, seccomp has a couple of excellent properties: > > > > First, while it has limited expressiveness, it is simple enough that the > > implementation can be easily understood and the scope for > > vulnerabilities that fall through the cracks of the seccomp sandbox > > model is low. Compare this to Windows' low-integrity/high-integrity > > sandbox system: there is a never ending string of sandbox escapes due to > > token misuse, unexpected things at various integrity levels, etc. > > Seccomp doesn't have tokens or integrity levels, and these bugs don't > > happen. > > > > Second, seccomp works, almost unchanged, in a completely unprivileged > > context. The last time making eBPF work sensibly in a less- or > > -unprivileged context, the maintainers mostly rejected the idea of > > developing/debugging a permission model for maps, cleaning up the bpf > > object id system, etc. You are going to have a very hard time > > convincing the seccomp maintainers to let any of these mechanism > > interact with seccomp until the underlying permission model is in place. > > > > --Andy > > Thanks for pointing out the tradeoff between expressiveness vs. simplicity. > > Note that we are _not_ proposing to replace cbpf, but propose to also > support ebpf filters. There certainly are use cases where cbpf is > sufficient, but there are also important use cases ebpf could make > life much easier. > > Most importantly, we strongly believe that ebpf filters can be > supported without reducing security. > > No worries about “party pooping” and we appreciate the feedback. We’d > love to hear concerns and collect feedback so we can address them to > hit that very high bar. > > > ~t > > -- > Tianyin Xu > University of Illinois at Urbana-Champaign > https://tianyin.github.io/