On Wed, Oct 30, 2024 at 3:45 AM Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > > > > > +static long int sock_calculate_tskey_offset(struct sock *sk, int val, int bpf_type) > > > > > > +{ > > > > > > + u32 tskey; > > > > > > + > > > > > > + if (sk_is_tcp(sk)) { > > > > > > + if ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)) > > > > > > + return -EINVAL; > > > > > > + > > > > > > + if (val & SOF_TIMESTAMPING_OPT_ID_TCP) > > > > > > + tskey = tcp_sk(sk)->write_seq; > > > > > > + else > > > > > > + tskey = tcp_sk(sk)->snd_una; > > > > > > + } else { > > > > > > + tskey = 0; > > > > > > + } > > > > > > + > > > > > > + if (bpf_type && (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) { > > > > > > + sk->sk_tskey_bpf_offset = tskey - atomic_read(&sk->sk_tskey); > > > > > > + return 0; > > > > > > + } else if (!bpf_type && (sk->sk_tsflags_bpf & SOF_TIMESTAMPING_OPT_ID)) { > > > > > > + sk->sk_tskey_bpf_offset = atomic_read(&sk->sk_tskey) - tskey; > > > > > > + } else { > > > > > > + sk->sk_tskey_bpf_offset = 0; > > > > > > + } > > > > > > + > > > > > > + return tskey; > > > > > > +} > > > > > > + > > > > > > int sock_set_tskey(struct sock *sk, int val, int bpf_type) > > > > > > { > > > > > > u32 tsflags = bpf_type ? sk->sk_tsflags_bpf : sk->sk_tsflags; > > > > > > @@ -901,17 +944,13 @@ int sock_set_tskey(struct sock *sk, int val, int bpf_type) > > > > > > > > > > > > if (val & SOF_TIMESTAMPING_OPT_ID && > > > > > > !(tsflags & SOF_TIMESTAMPING_OPT_ID)) { > > > > > > - if (sk_is_tcp(sk)) { > > > > > > - if ((1 << sk->sk_state) & > > > > > > - (TCPF_CLOSE | TCPF_LISTEN)) > > > > > > - return -EINVAL; > > > > > > - if (val & SOF_TIMESTAMPING_OPT_ID_TCP) > > > > > > - atomic_set(&sk->sk_tskey, tcp_sk(sk)->write_seq); > > > > > > - else > > > > > > - atomic_set(&sk->sk_tskey, tcp_sk(sk)->snd_una); > > > > > > - } else { > > > > > > - atomic_set(&sk->sk_tskey, 0); > > > > > > - } > > > > > > + long int ret; > > > > > > + > > > > > > + ret = sock_calculate_tskey_offset(sk, val, bpf_type); > > > > > > + if (ret <= 0) > > > > > > + return ret; > > > > > > + > > > > > > + atomic_set(&sk->sk_tskey, ret); > > > > > > } > > > > > > > > > > > > return 0; > > > > > > @@ -956,10 +995,15 @@ static int sock_set_timestamping_bpf(struct sock *sk, > > > > > > struct so_timestamping timestamping) > > > > > > { > > > > > > u32 flags = timestamping.flags; > > > > > > + int ret; > > > > > > > > > > > > if (flags & ~SOF_TIMESTAMPING_BPF_SUPPPORTED_MASK) > > > > > > return -EINVAL; > > > > > > > > > > > > + ret = sock_set_tskey(sk, flags, 1); > > > > > > + if (ret) > > > > > > + return ret; > > > > > > + > > > > > > WRITE_ONCE(sk->sk_tsflags_bpf, flags); > > > > > > > > > > > > return 0; > > > > > > > > > > I'm a bit hazy on when this can be called. We can assume that this new > > > > > BPF operation cannot race with the existing setsockopt nor with the > > > > > datapath that might touch the atomic fields, right? > > > > > > > > It surely can race with the existing setsockopt. > > > > > > > > 1) > > > > if (only existing setsockopt works) { > > > > then sk->sk_tskey is set through setsockopt, sk_tskey_bpf_offset is 0. > > > > } > > > > > > > > 2) > > > > if (only bpf setsockopt works) { > > > > then sk->sk_tskey is set through bpf_setsockopt, > > > > sk_tskey_bpf_offset is 0. > > > > } > > > > > > > > 3) > > > > if (existing setsockopt already started, here we enable the bpf feature) { > > > > then sk->sk_tskey will not change, but the sk_tskey_bpf_offset > > > > will be calculated. > > > > } > > > > > > > > 4) > > > > if (bpf setsockopt already started, here we enable the application feature) { > > > > then sk->sk_tskey will re-initialized/overridden by > > > > setsockopt, and the sk_tskey_bpf_offset will be calculated. > > > > } > > > > I will copy the above to the commit message next time in order to > > provide a clear design to future readers. > > > > > > > > > > Then the skb tskey will use the sk->sk_tskey like before. > > > > > > I mean race as in the setsockopt and bpf setsockopt and datapath > > > running concurrently. > > > > > > As long as both variants of setsockopt hold the socket lock, that > > > won't happen. > > > > > > The datapath is lockless for UDP, so atomic_inc sk_tskey can race > > > with calculating the difference. But this is a known issue. A process > > > that cares should not run setsockopt and send concurrently. So this is > > > fine too. > > > > Oh, now I see. Thanks for the detailed explanation! So Do you feel if > > we need to take care of this in the future, I mean, after this series > > gets merged...? > > If there is a race condition, then that cannot be fixed up later. > > But from my admittedly brief analysis, it seems that there is nothing > here that needs to be fixed: control plane operations (setsockopt) > hold the socket lock. A setsockopt that conflicts with a lockless > datapath update will have a slightly ambiguous offset. It is under > controlof and up to the user to avoid that if they care. I got it. Thanks.