On Wed, Jan 27, 2021 at 10:24 AM Andrey Ignatov <rdna@xxxxxx> wrote: > > Stanislav Fomichev <sdf@xxxxxxxxxx> [Tue, 2021-01-26 11:36 -0800]: > > At the moment, BPF_CGROUP_INET{4,6}_BIND hooks can rewrite user_port > > to the privileged ones (< ip_unprivileged_port_start), but it will > > be rejected later on in the __inet_bind or __inet6_bind. > > > > Let's add another return value to indicate that CAP_NET_BIND_SERVICE > > check should be ignored. Use the same idea as we currently use > > in cgroup/egress where bit #1 indicates CN. Instead, for > > cgroup/bind{4,6}, bit #1 indicates that CAP_NET_BIND_SERVICE should > > be bypassed. > > > > v4: > > - Add missing IPv6 support (Martin KaFai Lau) > > > > v3: > > - Update description (Martin KaFai Lau) > > - Fix capability restore in selftest (Martin KaFai Lau) > > > > v2: > > - Switch to explicit return code (Martin KaFai Lau) > > > > Cc: Andrey Ignatov <rdna@xxxxxx> > > Cc: Martin KaFai Lau <kafai@xxxxxx> > > Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxx> > > Explicit return code looks much cleaner than both what v1 did and what I > proposed earlier (compare port before/after). > > Just one nit from me but otherwide looks good. > > Acked-by: Andrey Ignatov <rdna@xxxxxx> > > ... > > @@ -231,30 +232,48 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, > > > > #define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, type) \ > > ({ \ > > + u32 __unused_flags; \ > > int __ret = 0; \ > > if (cgroup_bpf_enabled(type)) \ > > __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ > > - NULL); \ > > + NULL, \ > > + &__unused_flags); \ > > __ret; \ > > }) > > > > #define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, type, t_ctx) \ > > ({ \ > > + u32 __unused_flags; \ > > int __ret = 0; \ > > if (cgroup_bpf_enabled(type)) { \ > > lock_sock(sk); \ > > __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ > > - t_ctx); \ > > + t_ctx, \ > > + &__unused_flags); \ > > release_sock(sk); \ > > } \ > > __ret; \ > > }) > > > > -#define BPF_CGROUP_RUN_PROG_INET4_BIND_LOCK(sk, uaddr) \ > > - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET4_BIND, NULL) > > - > > -#define BPF_CGROUP_RUN_PROG_INET6_BIND_LOCK(sk, uaddr) \ > > - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET6_BIND, NULL) > > +/* BPF_CGROUP_INET4_BIND and BPF_CGROUP_INET6_BIND can return extra flags > > + * via upper bits of return code. The only flag that is supported > > + * (at bit position 0) is to indicate CAP_NET_BIND_SERVICE capability check > > + * should be bypassed. > > + */ > > +#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, type, flags) \ > > +({ \ > > + u32 __flags = 0; \ > > + int __ret = 0; \ > > + if (cgroup_bpf_enabled(type)) { \ > > + lock_sock(sk); \ > > + __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ > > + NULL, &__flags); \ > > + release_sock(sk); \ > > + if (__flags & 1) \ > > + *flags |= BIND_NO_CAP_NET_BIND_SERVICE; \ > > Nit: It took me some time to realize that there are two different > "flags": one to pass to __cgroup_bpf_run_filter_sock_addr() and another > to pass to __inet{,6}_bind/BPF_CGROUP_RUN_PROG_INET_BIND_LOCK that both carry > "BIND_NO_CAP_NET_BIND_SERVICE" flag but do it differently: > * hard-coded 0x1 in the former case; > * and BIND_NO_CAP_NET_BIND_SERVICE == (1 << 3) in the latter. > > I'm not sure how to make it more readable: maybe name `flags` and > `__flags` differently to highlight the difference (`bind_flags` and > `__flags`?) and add a #define for the "1" here? > > In anycase IMO it's not worth a respin and can be addressed by a > follow-up if you agree. Yeah, I agree, I didn't stress too much about it because we also have ret and _ret in BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY (and now BPF_PROG_RUN_ARRAY_FLAGS), but it looks confusing. Let me respin with bind_flags, shouldn't be too much work and can help with the readability in the future. Thanks for the review!