On 05/04, Andrey Ignatov wrote:
Stanislav Fomichev <sdf@xxxxxxxxxx> [Mon, 2020-05-04 10:34 -0700]:
> We want to have a tighter control on what ports we bind to in
> the BPF_CGROUP_INET{4,6}_CONNECT hooks even if it means
> connect() becomes slightly more expensive. The expensive part
> comes from the fact that we now need to call inet_csk_get_port()
> that verifies that the port is not used and allocates an entry
> in the hash table for it.
FWIW: Initially that IP_BIND_ADDRESS_NO_PORT limitation came from the
fact that on my specific use-case (mysql client making 200-500 connects
per sec to mysql server) disabling the option was making application
pretty much unusable (inet_csk_get_port was taking more time than mysql
client connect timeout == 3sec).
But I guess for some use-cases that call sys_connect not too often it
makes sense.
Yeah, I don't think we plan to reach those QPS numbers.
But, for the record, did you try to bind to a random port in that
case? And did you bail out on error or did a couple of retries?
> Since we can't rely on "snum || !bind_address_no_port" to prevent
> us from calling POST_BIND hook anymore, let's add another bind flag
> to indicate that the call site is BPF program.
>
> Cc: Andrey Ignatov <rdna@xxxxxx>
> Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxx>
> ---
> include/net/inet_common.h | 2 +
> net/core/filter.c | 9 +-
> net/ipv4/af_inet.c | 10 +-
> net/ipv6/af_inet6.c | 12 +-
> .../bpf/prog_tests/connect_force_port.c | 104 ++++++++++++++++++
> .../selftests/bpf/progs/connect_force_port4.c | 28 +++++
> .../selftests/bpf/progs/connect_force_port6.c | 28 +++++
> 7 files changed, 177 insertions(+), 16 deletions(-)
> create mode 100644
tools/testing/selftests/bpf/prog_tests/connect_force_port.c
> create mode 100644
tools/testing/selftests/bpf/progs/connect_force_port4.c
> create mode 100644
tools/testing/selftests/bpf/progs/connect_force_port6.c
Documentation in include/uapi/linux/bpf.h should be updated as well
since now it states this:
* **AF_INET6**). Looking for a free port to bind to can be
* expensive, therefore binding to port is not permitted by
the
* helper: *addr*\ **->sin_port** (or **sin6_port**,
respectively)
* must be set to zero.
IMO it's also worth to keep a note on performance implications of
setting port to non zero.
Ah, thank you, will do!
> diff --git a/net/core/filter.c b/net/core/filter.c
> index fa9ddab5dd1f..fc5161b9ff6a 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -4527,29 +4527,24 @@ BPF_CALL_3(bpf_bind, struct bpf_sock_addr_kern
*, ctx, struct sockaddr *, addr,
> struct sock *sk = ctx->sk;
> int err;
>
> - /* Binding to port can be expensive so it's prohibited in the helper.
> - * Only binding to IP is supported.
> - */
> err = -EINVAL;
> if (addr_len < offsetofend(struct sockaddr, sa_family))
> return err;
> if (addr->sa_family == AF_INET) {
> if (addr_len < sizeof(struct sockaddr_in))
> return err;
> - if (((struct sockaddr_in *)addr)->sin_port != htons(0))
> - return err;
> return __inet_bind(sk, addr, addr_len,
> + BIND_FROM_BPF |
> BIND_FORCE_ADDRESS_NO_PORT);
Should BIND_FORCE_ADDRESS_NO_PORT be passed only if port is zero?
Passing non zero port and BIND_FORCE_ADDRESS_NO_PORT at the same time
looks confusing (even though it works).
Makes sense, will remove it here, thx.