On Fri, Jan 10, 2020 at 11:50:16AM +0100, Jakub Sitnicki wrote: > With the realization that properly cloning listening sockets that have > psock state/callbacks is tricky, comes the second version of patches. > > The spirit of the patch set stays the same - make SOCKMAP a generic > collection for listening and established sockets. This would let us use the > SOCKMAP with reuseport today, and in the future hopefully with BPF programs > that run at socket lookup time [0]. For a bit more context, please see v1 > cover letter [1]. > > The biggest change that happened since v1 is how we deal with clearing > psock state in a copy of parent socket when cloning it (patches 3 & 4). > > As much as I did not want to touch icsk/tcp clone path, it seems > unavoidable. The changes were kept down to a minimum, with attention to not > break existing users. That said, a review from the TCP maintainer would be > invaluable (patches 3 & 4). > > Patches 1 & 2 will conflict with recently posted "Fixes for sockmap/tls > from more complex BPF progs" series [0]. I'll adapt or split them out this > series once sockmap/tls fixes from John land in bpf-next branch. > > Some food for thought - is mixing listening and established sockets in the > same BPF map a good idea? I don't know but I couldn't find a good reason to > restrict the user. > > Considering how much the code evolved, I didn't carry over Acks from v1. > > Thanks, > jkbs > > [0] https://lore.kernel.org/bpf/157851776348.1732.12600714815781177085.stgit@ubuntu3-kvm2/T/#t > [1] https://lore.kernel.org/bpf/20191123110751.6729-1-jakub@xxxxxxxxxxxxxx/ > > v1 -> v2: > > - af_ops->syn_recv_sock callback is no longer overridden and burdened with > restoring sk_prot and clearing sk_user_data in the child socket. As child > socket is already hashed when syn_recv_sock returns, it is too late to > put it in the right state. Instead patches 3 & 4 restore sk_prot and > clear sk_user_data before we hash the child socket. (Pointed out by > Martin Lau) > > - Annotate shared access to sk->sk_prot with READ_ONCE/WRITE_ONCE macros as > we write to it from sk_msg while socket might be getting cloned on > another CPU. (Suggested by John Fastabend) > > - Convert tests for SOCKMAP holding listening sockets to return-on-error > style, and hook them up to test_progs. Also use BPF skeleton for setup. > Add new tests to cover the race scenario discovered during v1 review. lgtm Martin, John, please review.