Re: [PATCH bpf-next v4 1/2] bpf: allow rewriting to ports under ip_unprivileged_port_start

Andrey Ignatov <rdna@xxxxxx> · Wed, 27 Jan 2021 10:24:29 -0800

Stanislav Fomichev <sdf@xxxxxxxxxx> [Tue, 2021-01-26 11:36 -0800]:
> At the moment, BPF_CGROUP_INET{4,6}_BIND hooks can rewrite user_port
> to the privileged ones (< ip_unprivileged_port_start), but it will
> be rejected later on in the __inet_bind or __inet6_bind.
> 
> Let's add another return value to indicate that CAP_NET_BIND_SERVICE
> check should be ignored. Use the same idea as we currently use
> in cgroup/egress where bit #1 indicates CN. Instead, for
> cgroup/bind{4,6}, bit #1 indicates that CAP_NET_BIND_SERVICE should
> be bypassed.
> 
> v4:
> - Add missing IPv6 support (Martin KaFai Lau)
> 
> v3:
> - Update description (Martin KaFai Lau)
> - Fix capability restore in selftest (Martin KaFai Lau)
> 
> v2:
> - Switch to explicit return code (Martin KaFai Lau)
> 
> Cc: Andrey Ignatov <rdna@xxxxxx>
> Cc: Martin KaFai Lau <kafai@xxxxxx>
> Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxx>

Explicit return code looks much cleaner than both what v1 did and what I
proposed earlier (compare port before/after).

Just one nit from me but otherwide looks good.

Acked-by: Andrey Ignatov <rdna@xxxxxx>

...
> @@ -231,30 +232,48 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
>  
>  #define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, type)				       \
>  ({									       \
> +	u32 __unused_flags;						       \
>  	int __ret = 0;							       \
>  	if (cgroup_bpf_enabled(type))					       \
>  		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type,     \
> -							  NULL);	       \
> +							  NULL,		       \
> +							  &__unused_flags);    \
>  	__ret;								       \
>  })
>  
>  #define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, type, t_ctx)		       \
>  ({									       \
> +	u32 __unused_flags;						       \
>  	int __ret = 0;							       \
>  	if (cgroup_bpf_enabled(type))	{				       \
>  		lock_sock(sk);						       \
>  		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type,     \
> -							  t_ctx);	       \
> +							  t_ctx,	       \
> +							  &__unused_flags);    \
>  		release_sock(sk);					       \
>  	}								       \
>  	__ret;								       \
>  })
>  
> -#define BPF_CGROUP_RUN_PROG_INET4_BIND_LOCK(sk, uaddr)			       \
> -	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET4_BIND, NULL)
> -
> -#define BPF_CGROUP_RUN_PROG_INET6_BIND_LOCK(sk, uaddr)			       \
> -	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET6_BIND, NULL)
> +/* BPF_CGROUP_INET4_BIND and BPF_CGROUP_INET6_BIND can return extra flags
> + * via upper bits of return code. The only flag that is supported
> + * (at bit position 0) is to indicate CAP_NET_BIND_SERVICE capability check
> + * should be bypassed.
> + */
> +#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, type, flags)	       \
> +({									       \
> +	u32 __flags = 0;						       \
> +	int __ret = 0;							       \
> +	if (cgroup_bpf_enabled(type))	{				       \
> +		lock_sock(sk);						       \
> +		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type,     \
> +							  NULL, &__flags);     \
> +		release_sock(sk);					       \
> +		if (__flags & 1)					       \
> +			*flags |= BIND_NO_CAP_NET_BIND_SERVICE;		       \

Nit: It took me some time to realize that there are two different
"flags": one to pass to __cgroup_bpf_run_filter_sock_addr() and another
to pass to __inet{,6}_bind/BPF_CGROUP_RUN_PROG_INET_BIND_LOCK that both carry
"BIND_NO_CAP_NET_BIND_SERVICE" flag but do it differently:
* hard-coded 0x1 in the former case;
* and BIND_NO_CAP_NET_BIND_SERVICE == (1 << 3) in the latter.

I'm not sure how to make it more readable: maybe name `flags` and
`__flags` differently to highlight the difference (`bind_flags` and
`__flags`?) and add a #define for the "1" here?

In anycase IMO it's not worth a respin and can be addressed by a
follow-up if you agree.

> +	}								       \
> +	__ret;								       \
> +})
>  
>  #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk)				       \
>  	((cgroup_bpf_enabled(BPF_CGROUP_INET4_CONNECT) ||		       \

-- 
Andrey Ignatov