On Wed, Jan 22, 2020 at 11:53 PM CET, Martin Lau wrote: > On Wed, Jan 22, 2020 at 02:05:47PM +0100, Jakub Sitnicki wrote: >> Commit 736b46027eb4 ("net: Add ID (if needed) to sock_reuseport and expose >> reuseport_lock") has introduced lazy generation of reuseport group IDs that >> survive group resize. >> >> By comparing the identifier we check if BPF reuseport program is not trying >> to select a socket from a BPF map that belongs to a different reuseport >> group than the one the packet is for. >> >> Because SOCKARRAY used to be the only BPF map type that can be used with >> reuseport BPF, it was possible to delay the generation of reuseport group >> ID until a socket from the group was inserted into BPF map for the first >> time. >> >> Now that SOCKMAP can be used with reuseport BPF we have two options, either >> generate the reuseport ID on map update, like SOCKARRAY does, or allocate >> an ID from the start when reuseport group gets created. >> >> This patch goes the latter approach to keep SOCKMAP free of calls into >> reuseport code. This streamlines the reuseport_id access as its lifetime >> now matches the longevity of reuseport object. >> >> The cost of this simplification, however, is that we allocate reuseport IDs >> for all SO_REUSEPORT users. Even those that don't use SOCKARRAY in their >> setups. With the way identifiers are currently generated, we can have at >> most S32_MAX reuseport groups, which hopefully is sufficient. > Not sure if it would be a concern. I think it is good as is. > For TCP, that would mean billion different ip:port listening socks > in inet_hashinfo. > > If it came to that, another idea is to use a 64bit reuseport_id which > practically won't wrap around. It could use the very first sk->sk_cookie > as the reuseport_id. All the ida logic will go away also in the expense > of +4 bytes. Thanks for the idea. I'll add it to the patch description. > >> >> Another change is that we now always call into SOCKARRAY logic to unlink >> the socket from the map when unhashing or closing the socket. Previously we >> did it only when at least one socket from the group was in a BPF map. >> >> It is worth noting that this doesn't conflict with SOCKMAP tear-down in >> case a socket is in a SOCKMAP and belongs to a reuseport group. SOCKMAP >> tear-down happens first: >> >> prot->unhash >> `- tcp_bpf_unhash >> |- tcp_bpf_remove >> | `- while (sk_psock_link_pop(psock)) >> | `- sk_psock_unlink >> | `- sock_map_delete_from_link >> | `- __sock_map_delete >> | `- sock_map_unref >> | `- sk_psock_put >> | `- sk_psock_drop >> | `- rcu_assign_sk_user_data(sk, NULL) >> `- inet_unhash >> `- reuseport_detach_sock >> `- bpf_sk_reuseport_detach >> `- WRITE_ONCE(sk->sk_user_data, NULL) > Thanks for the details. > > [ ... ] > >> @@ -200,12 +189,10 @@ void reuseport_detach_sock(struct sock *sk) >> reuse = rcu_dereference_protected(sk->sk_reuseport_cb, >> lockdep_is_held(&reuseport_lock)); >> >> - /* At least one of the sk in this reuseport group is added to >> - * a bpf map. Notify the bpf side. The bpf map logic will >> - * remove the sk if it is indeed added to a bpf map. >> + /* Notify the bpf side. The sk may be added to bpf map. The >> + * bpf map logic will remove the sk from the map if indeed. > s/indeed/needed/ ? > > I think it will be good to have a few words here like, that is needed > by sockarray but not necessary for sockmap which has its own ->unhash > to remove itself from the map. Yeah, I didn't do it justice. Will expand the comment. [...]