Re: [PATCH v5 0/6] Add eBPF hooks for cgroups

Daniel Borkmann <daniel@xxxxxxxxxxxxx> · Wed, 14 Sep 2016 13:42:49 +0200

On 09/14/2016 01:13 PM, Daniel Mack wrote:
On 09/13/2016 07:24 PM, Pablo Neira Ayuso wrote:
On Tue, Sep 13, 2016 at 03:31:20PM +0200, Daniel Mack wrote:
On 09/13/2016 01:56 PM, Pablo Neira Ayuso wrote:
On Mon, Sep 12, 2016 at 06:12:09PM +0200, Daniel Mack wrote:
This is v5 of the patch set to allow eBPF programs for network
filtering and accounting to be attached to cgroups, so that they apply
to all sockets of all tasks placed in that cgroup. The logic also
allows to be extendeded for other cgroup based eBPF logic.

1) This infrastructure can only be useful to systemd, or any similar
    orchestration daemon. Look, you can only apply filtering policies
    to processes that are launched by systemd, so this only works
    for server processes.

Sorry, but both statements aren't true. The eBPF policies apply to every
process that is placed in a cgroup, and my example program in 6/6 shows
how that can be done from the command line.

Then you have to explain me how can anyone else than systemd use this
infrastructure?

I have no idea what makes you think this is limited to systemd. As I
said, I provided an example for userspace that works from the command
line. The same limitation apply as for all other users of cgroups.

My main point is that those processes *need* to be launched by the
orchestrator, which is was refering as 'server processes'.

Yes, that's right. But as I said, this rule applies to many other kernel
concepts, so I don't see any real issue.

That's a limitation that applies to many more control mechanisms in the
kernel, and it's something that can easily be solved with fork+exec.

As long as you have control to launch the processes yes, but this
will not work in other scenarios. Just like cgroup net_cls and friends
are broken for filtering for things that you have no control to
fork+exec.

Probably, but that's only solvable with rules that store the full cgroup
path then, and do a string comparison (!) for each packet flying by.

That's just as transparent as SO_ATTACH_FILTER. What kind of
introspection mechanism do you have in mind?

SO_ATTACH_FILTER is called from the process itself, so this is a local
filtering policy that you apply to your own process.

Not necessarily. You can as well do it the inetd way, and pass the
socket to a process that is launched on demand, but do SO_ATTACH_FILTER
+ SO_LOCK_FILTER  in the middle. What happens with payload on the socket
is not transparent to the launched binary at all. The proposed cgroup
eBPF solution implements a very similar behavior in that regard.

It's about filtering outgoing network packets of applications, and
providing them with L2 information for filtering purposes. I don't think
that's a very specific use-case.

When the feature is not used at all, the added costs on the output path
are close to zero, due to the use of static branches.

*You're proposing a socket filtering facility that hooks layer 2
output path*!

As I said, I'm open to discussing that. In order to make it work for L3,
the LL_OFF issues need to be solved, as Daniel explained. Daniel,
Alexei, any idea how much work that would be?

Not much. You simply need to declare your own struct bpf_verifier_ops
with a get_func_proto() handler that handles BPF_FUNC_skb_load_bytes,
and verifier in do_check() loop would need to handle that these ld_abs/
ld_ind are rejected for BPF_PROG_TYPE_CGROUP_SOCKET.

That is only a rough ~30 lines kernel patchset to support this in
netfilter and only one extra input hook, with potential access to
conntrack and better integration with other existing subsystems.

Care to share the patches for that? I'd really like to have a look.

And FWIW, I agree with Thomas - there is nothing wrong with having
multiple options to use for such use-cases.

Thanks,
Daniel

--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html