Re: [PATCH net-next 0/3] eBPF Seccomp filters

Brian Goff <cpuguy83@xxxxxxxxx> · Tue, 13 Feb 2018 12:07:08 -0500

Agreed. I like the idea, but we'll have to maintain backwards compat at the
docker/runc level... but doesn't mean it shouldn't be added.
It may just take a long time to add support.

On Tue, Feb 13, 2018 at 12:02 PM, Jessie Frazelle <me@xxxxxxxxxxxx> wrote:

> On Tue, Feb 13, 2018 at 11:29 AM, Sargun Dhillon <sargun@xxxxxxxxx> wrote:
> > On Tue, Feb 13, 2018 at 7:47 AM, Kees Cook <keescook@xxxxxxxxxxxx>
> wrote:
> >> On Tue, Feb 13, 2018 at 7:42 AM, Sargun Dhillon <sargun@xxxxxxxxx>
> wrote:
> >>> This patchset enables seccomp filters to be written in eBPF. Although,
> >>> this patchset doesn't introduce much of the functionality enabled by
> >>> eBPF, it lays the ground work for it.
> >>>
> >>> It also introduces the capability to dump eBPF filters via the PTRACE
> >>> API in order to make it so that CHECKPOINT_RESTORE will be satisifed.
> >>> In the attached samples, there's an example of this. One can then use
> >>> BPF_OBJ_GET_INFO_BY_FD in order to get the actual code of the program,
> >>> and use that at reload time.
> >>>
> >>> The primary reason for not adding maps support in this patchset is
> >>> to avoid introducing new complexities around PR_SET_NO_NEW_PRIVS.
> >>> If we have a map that the BPF program can read, it can potentially
> >>> "change" privileges after running. It seems like doing writes only
> >>> is safe, because it can be pure, and side effect free, and therefore
> >>> not negatively effect PR_SET_NO_NEW_PRIVS. Nonetheless, if we come
> >>> to an agreement, this can be in a follow-up patchset.
> >>
> >> What's the reason for adding eBPF support? seccomp shouldn't need it,
> >> and it only makes the code more complex. I'd rather stick with  -- cBPF
> >> until we have an overwhelmingly good reason to use eBPF as a "native"
> >> seccomp filter language.
> >>
> >> -Kees
> >>
> > Three reasons:
> > 1) The userspace tooling for eBPF is much better than the user space
> > tooling for cBPF. Our use case is specifically to optimize Docker
> > policies. This is roughly what their seccomp policy looks like:
> > https://github.com/moby/moby/blob/master/profiles/seccomp/default.json.
> > It would be much nicer to be able to leverage eBPF to write this in C,
> > or any other the other languages targetting eBPF. In addition, if we
> > have write-only maps, we can exfiltrate information from seccomp, like
> > arguments, and errors in a relatively cheap way compared to cBPF, and
> > then extract this via the bcc stack. Writing cBPF via C macros is a
> > pain, and the off the shelf cBPF libraries are getting no love. The
> > eBPF community is *exploding* with contributions.
>
> Is stage two of this getting runc to support eBPF and docker to change
> the default to be written as eBPF, because I foresee that being a
> problem mainly with the kernel versions people use. The point of that
> patch was to help the most people and as your point in (2) is made
> about performance, that is a trade-off I would be willing to make in
> order to have this functionality on more kernel versions.
>
> The other alternative would be to have docker translate to use eBPF if
> the kernel supported it, but that amount of complexity seems a bit
> unnecessary for a feature that was trying to also be "simple".
>
> Or do you plan on wrapping filters onto processes tangentially from
> the runtime, in which case, that should be totally fine :)
>
> Anyways this is kinda a tangent from the main point of getting it in
> the kernel, just I would hate to see someone having to maintain this
> without there being a path to getting it upstream elsewhere.
>
> >
> > 2) In my testing, which thus so far has been very rudimentary, with
> > rewriting the policy that libseccomp generates from the Docker policy
> > to use eBPF, and eBPF maps performs much better than cBPF. The
> > specific case tested was to use a bpf array to lookup rules for a
> > particular syscall. In a super trivial test, this was about 5% low
> > latency than using traditional branches. If you need more evidence of
> > this, I can work a little bit more on the maps related patches, and
> > see if I can get some more benchmarking. From my understanding, we
> > would need to add "sealing" support for maps, in which they can be
> > marked as read-only, and only at that point should an eBPF seccomp
> > program be able to read from them.
> >
> > 3) Eventually, I'd like to use some more advanced capabilities of
> > eBPF, like being able to rewrite arguments safely (not things referred
> > to by pointers, but just plain old arguments).
> >
> >>>
> >>>
> >>> Sargun Dhillon (3):
> >>>   bpf, seccomp: Add eBPF filter capabilities
> >>>   seccomp, ptrace: Add a mechanism to retrieve attached eBPF seccomp
> >>>     filters
> >>>   bpf: Add eBPF seccomp sample programs
> >>>
> >>>  arch/Kconfig                 |   7 ++
> >>>  include/linux/bpf_types.h    |   3 +
> >>>  include/linux/seccomp.h      |  12 +++
> >>>  include/uapi/linux/bpf.h     |   2 +
> >>>  include/uapi/linux/ptrace.h  |   5 +-
> >>>  include/uapi/linux/seccomp.h |  15 ++--
> >>>  kernel/bpf/syscall.c         |   1 +
> >>>  kernel/ptrace.c              |   3 +
> >>>  kernel/seccomp.c             | 185 ++++++++++++++++++++++++++++++
> ++++++++-----
> >>>  samples/bpf/Makefile         |   9 +++
> >>>  samples/bpf/bpf_load.c       |   9 ++-
> >>>  samples/bpf/seccomp1_kern.c  |  17 ++++
> >>>  samples/bpf/seccomp1_user.c  |  34 ++++++++
> >>>  samples/bpf/seccomp2_kern.c  |  24 ++++++
> >>>  samples/bpf/seccomp2_user.c  |  66 +++++++++++++++
> >>>  15 files changed, 362 insertions(+), 30 deletions(-)
> >>>  create mode 100644 samples/bpf/seccomp1_kern.c
> >>>  create mode 100644 samples/bpf/seccomp1_user.c
> >>>  create mode 100644 samples/bpf/seccomp2_kern.c
> >>>  create mode 100644 samples/bpf/seccomp2_user.c
> >>>
> >>> --
> >>> 2.14.1
> >>>
> >>
> >>
> >>
> >> --
> >> Kees Cook
> >> Pixel Security
>
>
>
> --
>
>
> Jessie Frazelle
> 4096R / D4C4 DD60 0D66 F65A 8EFC  511E 18F3 685C 0022 BFF3
> pgp.mit.edu
> _______________________________________________
> Containers mailing list
> Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
> https://lists.linuxfoundation.org/mailman/listinfo/containers
>

-- 

- Brian Goff
_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/containers