On Mon, May 17, 2021 at 10:40 AM Tycho Andersen <tycho@tycho.pizza> wrote: > > On Sun, May 16, 2021 at 03:38:00AM -0500, Tianyin Xu wrote: > > On Sat, May 15, 2021 at 10:49 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > > > > > On 5/10/21 10:21 PM, YiFei Zhu wrote: > > > > On Mon, May 10, 2021 at 12:47 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > > >> On Mon, May 10, 2021 at 10:22 AM YiFei Zhu <zhuyifei1999@xxxxxxxxx> wrote: > > > >>> > > > >>> From: YiFei Zhu <yifeifz2@xxxxxxxxxxxx> > > > >>> > > > >>> Based on: https://urldefense.com/v3/__https://lists.linux-foundation.org/pipermail/containers/2018-February/038571.html__;!!DZ3fjg!thbAoRgmCeWjlv0qPDndNZW1j6Y2Kl_huVyUffr4wVbISf-aUiULaWHwkKJrNJyo$ > > > >>> > > > >>> This patchset enables seccomp filters to be written in eBPF. > > > >>> Supporting eBPF filters has been proposed a few times in the past. > > > >>> The main concerns were (1) use cases and (2) security. We have > > > >>> identified many use cases that can benefit from advanced eBPF > > > >>> filters, such as: > > > >> > > > >> I haven't reviewed this carefully, but I think we need to distinguish > > > >> a few things: > > > >> > > > >> 1. Using the eBPF *language*. > > > >> > > > >> 2. Allowing the use of stateful / non-pure eBPF features. > > > >> > > > >> 3. Allowing the eBPF programs to read the target process' memory. > > > >> > > > >> I'm generally in favor of (1). I'm not at all sure about (2), and I'm > > > >> even less convinced by (3). > > > >> > > > >>> > > > >>> * exec-only-once filter / apply filter after exec > > > >> > > > >> This is (2). I'm not sure it's a good idea. > > > > > > > > The basic idea is that for a container runtime it may wait to execute > > > > a program in a container without that program being able to execve > > > > another program, stopping any attack that involves loading another > > > > binary. The container runtime can block any syscall but execve in the > > > > exec-ed process by using only cBPF. > > > > > > > > The use case is suggested by Andrea Arcangeli and Giuseppe Scrivano. > > > > @Andrea and @Giuseppe, could you clarify more in case I missed > > > > something? > > > > > > We've discussed having a notifier-using filter be able to replace its > > > filter. This would allow this and other use cases without any > > > additional eBPF or cBPF code. > > > > > > > A notifier is not always a solution (even ignoring its perf overhead). > > > > One problem, pointed out by Andrea Arcangeli, is that notifiers need > > userspace daemons. So, it can hardly be used by daemonless container > > engines like Podman. > > I'm not sure I buy this argument. Podman already has a conmon instance > for each container, this could be a child of that conmon process, or > live inside conmon itself. > > Tycho I checked with Andrea Arcangeli and Giuseppe Scrivano who are working on Podman. You are right that Podman is not completely daemonless. However, “the fact it's no entirely daemonless doesn't imply it's a good idea to make it worse and to add complexity to the background conmon daemon or to add more daemons.” TL;DR. User notifiers are surely more flexible, but are also more expensive and complex to implement, compared with ebpf filters. /* I’ll reply to Sargun’s performance argument in a separate email */ I'm sure you know Podman well, but let me still move some jade from Andrea and Giuseppe (all credits on podmon/crun are theirs) to elaborate the point, for folks cced on the list who are not very familiar with Podman. Basically, the current order goes as follows: podman -> conmon -> crun -> container_binary \ - seccomp done at crun level, not conmon At runtime, what's left is: conmon -> container_binary /* podman disappears; crun disappears */ So, to go through and use seccomp notify to block `exec`, we can either start the container_binary with a seccomp agent wrapper, or bloat the common binary (as pointed out by Tycho). If we go with the first approach, we will have: podman -> conmon -> crun -> seccomp_agent -> container_binary So, at runtime we'd be left with one more daemon: conmon -> seccomp_agent -> container_binary Apparently, nobody likes one more daemon. So, the proposal from Giuseppe was/is to use user notifiers as plugins (.so) loaded by conmon: https://github.com/containers/conmon/pull/190 https://github.com/containers/crun/pull/438 Now, with the ebpf filter support, one can implement the same thing using an embarrassingly simple ebpf filter and, thanks to Giuseppe, this is well supported by crun. -- Tianyin Xu University of Illinois at Urbana-Champaign https://tianyin.github.io/