On Wed, Jan 22, 2020 at 02:05:47PM +0100, Jakub Sitnicki wrote: > Commit 736b46027eb4 ("net: Add ID (if needed) to sock_reuseport and expose > reuseport_lock") has introduced lazy generation of reuseport group IDs that > survive group resize. > > By comparing the identifier we check if BPF reuseport program is not trying > to select a socket from a BPF map that belongs to a different reuseport > group than the one the packet is for. > > Because SOCKARRAY used to be the only BPF map type that can be used with > reuseport BPF, it was possible to delay the generation of reuseport group > ID until a socket from the group was inserted into BPF map for the first > time. > > Now that SOCKMAP can be used with reuseport BPF we have two options, either > generate the reuseport ID on map update, like SOCKARRAY does, or allocate > an ID from the start when reuseport group gets created. > > This patch goes the latter approach to keep SOCKMAP free of calls into > reuseport code. This streamlines the reuseport_id access as its lifetime > now matches the longevity of reuseport object. > > The cost of this simplification, however, is that we allocate reuseport IDs > for all SO_REUSEPORT users. Even those that don't use SOCKARRAY in their > setups. With the way identifiers are currently generated, we can have at > most S32_MAX reuseport groups, which hopefully is sufficient. Not sure if it would be a concern. I think it is good as is. For TCP, that would mean billion different ip:port listening socks in inet_hashinfo. If it came to that, another idea is to use a 64bit reuseport_id which practically won't wrap around. It could use the very first sk->sk_cookie as the reuseport_id. All the ida logic will go away also in the expense of +4 bytes. > > Another change is that we now always call into SOCKARRAY logic to unlink > the socket from the map when unhashing or closing the socket. Previously we > did it only when at least one socket from the group was in a BPF map. > > It is worth noting that this doesn't conflict with SOCKMAP tear-down in > case a socket is in a SOCKMAP and belongs to a reuseport group. SOCKMAP > tear-down happens first: > > prot->unhash > `- tcp_bpf_unhash > |- tcp_bpf_remove > | `- while (sk_psock_link_pop(psock)) > | `- sk_psock_unlink > | `- sock_map_delete_from_link > | `- __sock_map_delete > | `- sock_map_unref > | `- sk_psock_put > | `- sk_psock_drop > | `- rcu_assign_sk_user_data(sk, NULL) > `- inet_unhash > `- reuseport_detach_sock > `- bpf_sk_reuseport_detach > `- WRITE_ONCE(sk->sk_user_data, NULL) Thanks for the details. [ ... ] > @@ -200,12 +189,10 @@ void reuseport_detach_sock(struct sock *sk) > reuse = rcu_dereference_protected(sk->sk_reuseport_cb, > lockdep_is_held(&reuseport_lock)); > > - /* At least one of the sk in this reuseport group is added to > - * a bpf map. Notify the bpf side. The bpf map logic will > - * remove the sk if it is indeed added to a bpf map. > + /* Notify the bpf side. The sk may be added to bpf map. The > + * bpf map logic will remove the sk from the map if indeed. s/indeed/needed/ ? I think it will be good to have a few words here like, that is needed by sockarray but not necessary for sockmap which has its own ->unhash to remove itself from the map. Others lgtm. > */ > - if (reuse->reuseport_id) > - bpf_sk_reuseport_detach(sk); > + bpf_sk_reuseport_detach(sk); > > rcu_assign_pointer(sk->sk_reuseport_cb, NULL); > > -- > 2.24.1 >