On Thu, Jan 14, 2021 at 6:38 PM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Thu, Jan 14, 2021 at 4:19 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > > > > We are playing with doing hybrid conntrack where BPF generates > > connect/disconnect/etc events and puts them into perfbuf (or, later, > > new ringbuf). We can get most of the functionality out of > > existing hooks: > > - BPF_CGROUP_SOCK_OPS fully covers TCP > > - BPF_CGROUP_UDP4_SENDMSG covers unconnected UDP (with sampling, etc) > > > > The only missing bit is connected UDP where we can get some > > information from the existing BPF_CGROUP_INET{4,6}_CONNECT if the caller > > did explicit bind(); otherwise, in an autobind case, we get > > only destination addr/port and no source port because this hook > > triggers prior to that. > > > > We'd really like to avoid the cost of BPF_CGROUP_INET_EGRESS > > and filtering UDP (which covers both connected and unconnected UDP, > > but loses that connect/disconnect pseudo signal). > > > > The proposal is to add a new BPF_CGROUP_INET_SOCK_POST_CONNECT which > > triggers right before sys_connect exits in the AF_INET{,6} case. > > The context is bpf_sock which lets BPF examine the socket state. > > There is really no reason for it to trigger for all inet socks, > > I've considered adding BPF_CGROUP_UDP_POST_CONNECT, but decided > > that it might be better to have a generic inet case. > > > > New hook triggers right before sys_connect() returns and gives > > BPF an opportunity to explore source & destination addresses > > as well as ability to return EPERM to the user. > > > > This is somewhat analogous to the existing BPF_CGROUP_INET{4,6}_POST_BIND > > hooks with the intention to log the connection addresses (after autobind). > > > > Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxx> > > Change-Id: I46d0122f93c58b17bfae5ba5040b0b0343908c19 > > --- > > include/linux/bpf-cgroup.h | 17 +++++++++++++++++ > > include/uapi/linux/bpf.h | 1 + > > kernel/bpf/syscall.c | 3 +++ > > net/core/filter.c | 4 ++++ > > net/ipv4/af_inet.c | 7 ++++++- > > 5 files changed, 31 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h > > index 72e69a0e1e8c..f110935258b9 100644 > > --- a/include/linux/bpf-cgroup.h > > +++ b/include/linux/bpf-cgroup.h > > @@ -213,12 +213,29 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, > > __ret; \ > > }) > > > > +#define BPF_CGROUP_RUN_SK_PROG_LOCKED(sk, type) \ > > +({ \ > > + int __ret = 0; \ > > + if (cgroup_bpf_enabled) { \ > > + lock_sock(sk); \ > > + __ret = __cgroup_bpf_run_filter_sk(sk, type); \ > > + release_sock(sk); \ > > + } \ > > + __ret; \ > > +}) > > + > > #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) \ > > BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET_SOCK_CREATE) > > > > #define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk) \ > > BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET_SOCK_RELEASE) > > > > +#define BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT(sk) \ > > + BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET_SOCK_POST_CONNECT) > > + > > +#define BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT_LOCKED(sk) \ > > + BPF_CGROUP_RUN_SK_PROG_LOCKED(sk, BPF_CGROUP_INET_SOCK_POST_CONNECT) > > + > > #define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk) \ > > BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET4_POST_BIND) > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > index a1ad32456f89..3235f7bd131f 100644 > > --- a/include/uapi/linux/bpf.h > > +++ b/include/uapi/linux/bpf.h > > @@ -241,6 +241,7 @@ enum bpf_attach_type { > > BPF_XDP_CPUMAP, > > BPF_SK_LOOKUP, > > BPF_XDP, > > + BPF_CGROUP_INET_SOCK_POST_CONNECT, > > Adding new bpf_attach_type enums keeps blowing up the size of struct > cgroup_bpf. Right now we have 38 different values, of which 15 values > are not related to cgroups (judging by their name). That results in 15 > * (8 + 16 + 4) = 420 extra bytes wasted for each struct cgroup_bpf > (and thus struct cgroup). Probably not critical, but it would be nice > to not waste space unnecessarily. > > Would anyone be interested in addressing this? Basically, instead of > using MAX_BPF_ATTACH_TYPE from enum bpf_attach_type, we'd need to have > cgroup-specific enumeration and mapping bpf_attach_type to that > bpf_cgroup_attach_type to compactly store information in struct > cgroup_bpf. Thoughts? Sure, I can get to that at some point if nobody beats me to it. Assuming we have 10k cgroups on the machine, it would save about 4mb, which doesn't look alarming. > > __MAX_BPF_ATTACH_TYPE > > }; > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > index c3bb03c8371f..7d6fd1e32d22 100644 > > --- a/kernel/bpf/syscall.c > > +++ b/kernel/bpf/syscall.c > > @@ -1958,6 +1958,7 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type, > > switch (expected_attach_type) { > > case BPF_CGROUP_INET_SOCK_CREATE: > > case BPF_CGROUP_INET_SOCK_RELEASE: > > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > > case BPF_CGROUP_INET4_POST_BIND: > > case BPF_CGROUP_INET6_POST_BIND: > > return 0; > > @@ -2910,6 +2911,7 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type) > > return BPF_PROG_TYPE_CGROUP_SKB; > > case BPF_CGROUP_INET_SOCK_CREATE: > > case BPF_CGROUP_INET_SOCK_RELEASE: > > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > > case BPF_CGROUP_INET4_POST_BIND: > > case BPF_CGROUP_INET6_POST_BIND: > > return BPF_PROG_TYPE_CGROUP_SOCK; > > @@ -3063,6 +3065,7 @@ static int bpf_prog_query(const union bpf_attr *attr, > > case BPF_CGROUP_INET_EGRESS: > > case BPF_CGROUP_INET_SOCK_CREATE: > > case BPF_CGROUP_INET_SOCK_RELEASE: > > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > > case BPF_CGROUP_INET4_BIND: > > case BPF_CGROUP_INET6_BIND: > > case BPF_CGROUP_INET4_POST_BIND: > > diff --git a/net/core/filter.c b/net/core/filter.c > > index 9ab94e90d660..d955321d3415 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -7683,12 +7683,14 @@ static bool __sock_filter_check_attach_type(int off, > > switch (attach_type) { > > case BPF_CGROUP_INET_SOCK_CREATE: > > case BPF_CGROUP_INET_SOCK_RELEASE: > > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > > goto full_access; > > default: > > return false; > > } > > case bpf_ctx_range(struct bpf_sock, src_ip4): > > switch (attach_type) { > > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > > case BPF_CGROUP_INET4_POST_BIND: > > goto read_only; > > default: > > @@ -7696,6 +7698,7 @@ static bool __sock_filter_check_attach_type(int off, > > } > > case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]): > > switch (attach_type) { > > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > > case BPF_CGROUP_INET6_POST_BIND: > > goto read_only; > > default: > > @@ -7703,6 +7706,7 @@ static bool __sock_filter_check_attach_type(int off, > > } > > case bpf_ctx_range(struct bpf_sock, src_port): > > switch (attach_type) { > > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > > case BPF_CGROUP_INET4_POST_BIND: > > case BPF_CGROUP_INET6_POST_BIND: > > goto read_only; > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > > index b94fa8eb831b..568654cafa48 100644 > > --- a/net/ipv4/af_inet.c > > +++ b/net/ipv4/af_inet.c > > @@ -574,7 +574,10 @@ int inet_dgram_connect(struct socket *sock, struct sockaddr *uaddr, > > > > if (!inet_sk(sk)->inet_num && inet_autobind(sk)) > > return -EAGAIN; > > - return sk->sk_prot->connect(sk, uaddr, addr_len); > > + err = sk->sk_prot->connect(sk, uaddr, addr_len); > > + if (!err) > > + err = BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT_LOCKED(sk); > > + return err; > > } > > EXPORT_SYMBOL(inet_dgram_connect); > > > > Have you tried attaching the fexit program to inet_dgram_connect? > Doesn't it give all the information you need? > > > @@ -723,6 +726,8 @@ int inet_stream_connect(struct socket *sock, struct sockaddr *uaddr, > > > > lock_sock(sock->sk); > > err = __inet_stream_connect(sock, uaddr, addr_len, flags, 0); > > Similarly here, attaching fexit to __inet_stream_connect would execute > your BPF program at exactly the same time (and then you can check for > err value). > > Or the point here is to have a more "stable" BPF program type? Good suggestion, I can try to play with it, I think it should give me all the info I need (I only need sock). But yeah, I'd rather prefer a stable interface against stable __sk_buff, but maybe fexit will also work.