On Wed, May 06, 2020 at 09:22:45AM -0700, sdf@xxxxxxxxxx wrote: > On 05/05, Martin KaFai Lau wrote: > > On Tue, May 05, 2020 at 01:27:29PM -0700, Stanislav Fomichev wrote: > > > We want to have a tighter control on what ports we bind to in > > > the BPF_CGROUP_INET{4,6}_CONNECT hooks even if it means > > > connect() becomes slightly more expensive. The expensive part > > > comes from the fact that we now need to call inet_csk_get_port() > > > that verifies that the port is not used and allocates an entry > > > in the hash table for it. > > > > > > Since we can't rely on "snum || !bind_address_no_port" to prevent > > > us from calling POST_BIND hook anymore, let's add another bind flag > > > to indicate that the call site is BPF program. > > > > > > v2: > > > * Update documentation (Andrey Ignatov) > > > * Pass BIND_FORCE_ADDRESS_NO_PORT conditionally (Andrey Ignatov) > > > > > > Cc: Andrey Ignatov <rdna@xxxxxx> > > > Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxx> > > > --- > > > include/net/inet_common.h | 2 + > > > include/uapi/linux/bpf.h | 9 +- > > > net/core/filter.c | 18 ++- > > > net/ipv4/af_inet.c | 10 +- > > > net/ipv6/af_inet6.c | 12 +- > > > tools/include/uapi/linux/bpf.h | 9 +- > > > .../bpf/prog_tests/connect_force_port.c | 104 ++++++++++++++++++ > > > .../selftests/bpf/progs/connect_force_port4.c | 28 +++++ > > > .../selftests/bpf/progs/connect_force_port6.c | 28 +++++ > > > 9 files changed, 192 insertions(+), 28 deletions(-) > > > create mode 100644 > > tools/testing/selftests/bpf/prog_tests/connect_force_port.c > > > create mode 100644 > > tools/testing/selftests/bpf/progs/connect_force_port4.c > > > create mode 100644 > > tools/testing/selftests/bpf/progs/connect_force_port6.c > > > > > > diff --git a/include/net/inet_common.h b/include/net/inet_common.h > > > index c38f4f7d660a..cb2818862919 100644 > > > --- a/include/net/inet_common.h > > > +++ b/include/net/inet_common.h > > > @@ -39,6 +39,8 @@ int inet_bind(struct socket *sock, struct sockaddr > > *uaddr, int addr_len); > > > #define BIND_FORCE_ADDRESS_NO_PORT (1 << 0) > > > /* Grab and release socket lock. */ > > > #define BIND_WITH_LOCK (1 << 1) > > > +/* Called from BPF program. */ > > > +#define BIND_FROM_BPF (1 << 2) > > > int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len, > > > u32 flags); > > > int inet_getname(struct socket *sock, struct sockaddr *uaddr, > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > index b3643e27e264..14b5518a3d5b 100644 > > > --- a/include/uapi/linux/bpf.h > > > +++ b/include/uapi/linux/bpf.h > > > @@ -1994,10 +1994,11 @@ union bpf_attr { > > > * > > > * This helper works for IPv4 and IPv6, TCP and UDP sockets. The > > > * domain (*addr*\ **->sa_family**) must be **AF_INET** (or > > > - * **AF_INET6**). Looking for a free port to bind to can be > > > - * expensive, therefore binding to port is not permitted by the > > > - * helper: *addr*\ **->sin_port** (or **sin6_port**, respectively) > > > - * must be set to zero. > > > + * **AF_INET6**). It's advised to pass zero port (**sin_port** > > > + * or **sin6_port**) which triggers IP_BIND_ADDRESS_NO_PORT-like > > > + * behavior and lets the kernel reuse the same source port > > Reading "zero port" and "the same source port" together is confusing. > Ack, let me try rephrase it a bit to make it more clear. > > > > + * as long as 4-tuple is unique. Passing non-zero port might > > > + * lead to degraded performance. > > Is the "degraded performance" also true for UDP? > I suppose everything that is "allocating" port at bind time can lead > to a faster port exhaustion, so UDP should be also affected. > Although, looking at udp_v4_get_port, it looks less involved than > its TCP counterpart. I was mostly curious after taking a quick look in both TCP and UDP. It does not affect the patch here. May be something for optimization later. >From skimming the codes, I was thinking UDP should be more or less the same since they have to go through get_port() eventually. TCP will be degraded from inet_hash_connect() [which seems to be using the ehash and 4 tuples] to get_port().