Re: [PATCH bpf-next v3 10/12] net: Generate reuseport group ID on group creation

Martin Lau <kafai@xxxxxx> · Wed, 22 Jan 2020 22:53:56 +0000

On Wed, Jan 22, 2020 at 02:05:47PM +0100, Jakub Sitnicki wrote:
> Commit 736b46027eb4 ("net: Add ID (if needed) to sock_reuseport and expose
> reuseport_lock") has introduced lazy generation of reuseport group IDs that
> survive group resize.
> 
> By comparing the identifier we check if BPF reuseport program is not trying
> to select a socket from a BPF map that belongs to a different reuseport
> group than the one the packet is for.
> 
> Because SOCKARRAY used to be the only BPF map type that can be used with
> reuseport BPF, it was possible to delay the generation of reuseport group
> ID until a socket from the group was inserted into BPF map for the first
> time.
> 
> Now that SOCKMAP can be used with reuseport BPF we have two options, either
> generate the reuseport ID on map update, like SOCKARRAY does, or allocate
> an ID from the start when reuseport group gets created.
> 
> This patch goes the latter approach to keep SOCKMAP free of calls into
> reuseport code. This streamlines the reuseport_id access as its lifetime
> now matches the longevity of reuseport object.
> 
> The cost of this simplification, however, is that we allocate reuseport IDs
> for all SO_REUSEPORT users. Even those that don't use SOCKARRAY in their
> setups. With the way identifiers are currently generated, we can have at
> most S32_MAX reuseport groups, which hopefully is sufficient.
Not sure if it would be a concern.  I think it is good as is.
For TCP, that would mean billion different ip:port listening socks
in inet_hashinfo.

If it came to that, another idea is to use a 64bit reuseport_id which
practically won't wrap around.  It could use the very first sk->sk_cookie
as the reuseport_id.  All the ida logic will go away also in the expense
of +4 bytes.

> 
> Another change is that we now always call into SOCKARRAY logic to unlink
> the socket from the map when unhashing or closing the socket. Previously we
> did it only when at least one socket from the group was in a BPF map.
> 
> It is worth noting that this doesn't conflict with SOCKMAP tear-down in
> case a socket is in a SOCKMAP and belongs to a reuseport group. SOCKMAP
> tear-down happens first:
> 
>   prot->unhash
>   `- tcp_bpf_unhash
>      |- tcp_bpf_remove
>      |  `- while (sk_psock_link_pop(psock))
>      |     `- sk_psock_unlink
>      |        `- sock_map_delete_from_link
>      |           `- __sock_map_delete
>      |              `- sock_map_unref
>      |                 `- sk_psock_put
>      |                    `- sk_psock_drop
>      |                       `- rcu_assign_sk_user_data(sk, NULL)
>      `- inet_unhash
>         `- reuseport_detach_sock
>            `- bpf_sk_reuseport_detach
>               `- WRITE_ONCE(sk->sk_user_data, NULL)
Thanks for the details.

[ ... ]

> @@ -200,12 +189,10 @@ void reuseport_detach_sock(struct sock *sk)
>  	reuse = rcu_dereference_protected(sk->sk_reuseport_cb,
>  					  lockdep_is_held(&reuseport_lock));
>  
> -	/* At least one of the sk in this reuseport group is added to
> -	 * a bpf map.  Notify the bpf side.  The bpf map logic will
> -	 * remove the sk if it is indeed added to a bpf map.
> +	/* Notify the bpf side. The sk may be added to bpf map. The
> +	 * bpf map logic will remove the sk from the map if indeed.
s/indeed/needed/ ?

I think it will be good to have a few words here like, that is needed
by sockarray but not necessary for sockmap which has its own ->unhash
to remove itself from the map.

Others lgtm.

>  	 */
> -	if (reuse->reuseport_id)
> -		bpf_sk_reuseport_detach(sk);
> +	bpf_sk_reuseport_detach(sk);
>  
>  	rcu_assign_pointer(sk->sk_reuseport_cb, NULL);
>  
> -- 
> 2.24.1
>