On Tue, Oct 29, 2024 at 11:03 PM Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> wrote: > > Jason Xing wrote: > > On Tue, Oct 29, 2024 at 9:24 AM Willem de Bruijn > > <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > > > > Jason Xing wrote: > > > > From: Jason Xing <kernelxing@xxxxxxxxxxx> > > > > > > > > Use the offset to record the delta value between current socket key > > > > and bpf socket key. > > > > > > > > 1. If there is only bpf feature running, the socket key is bpf socket > > > > key and the offset is zero; > > > > 2. If there is only traditional feature running, and then bpf feature > > > > is turned on, the socket key is still used by the former while the offset > > > > is the delta between them; > > > > 3. if there is only bpf feature running, and then application uses it, > > > > the socket key would be re-init for application and the offset is the > > > > delta. > > > > > > We need to also figure out the rare conflict when one user sets > > > OPT_ID | OPT_ID_TCP while the other only uses OPT_ID. > > > > I think the current patch handles the case because: > > 1. sock_calculate_tskey_offset() gets the final key first whether the > > OPT_ID_TCP is set or not. > > 2. we will use that tskey to calculate the delta. > > Oh yes of course. Great, then this is resolved. > > > > > +static long int sock_calculate_tskey_offset(struct sock *sk, int val, int bpf_type) > > > > +{ > > > > + u32 tskey; > > > > + > > > > + if (sk_is_tcp(sk)) { > > > > + if ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)) > > > > + return -EINVAL; > > > > + > > > > + if (val & SOF_TIMESTAMPING_OPT_ID_TCP) > > > > + tskey = tcp_sk(sk)->write_seq; > > > > + else > > > > + tskey = tcp_sk(sk)->snd_una; > > > > + } else { > > > > + tskey = 0; > > > > + } > > > > + > > > > + if (bpf_type && (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) { > > > > + sk->sk_tskey_bpf_offset = tskey - atomic_read(&sk->sk_tskey); > > > > + return 0; > > > > + } else if (!bpf_type && (sk->sk_tsflags_bpf & SOF_TIMESTAMPING_OPT_ID)) { > > > > + sk->sk_tskey_bpf_offset = atomic_read(&sk->sk_tskey) - tskey; > > > > + } else { > > > > + sk->sk_tskey_bpf_offset = 0; > > > > + } > > > > + > > > > + return tskey; > > > > +} > > > > + > > > > int sock_set_tskey(struct sock *sk, int val, int bpf_type) > > > > { > > > > u32 tsflags = bpf_type ? sk->sk_tsflags_bpf : sk->sk_tsflags; > > > > @@ -901,17 +944,13 @@ int sock_set_tskey(struct sock *sk, int val, int bpf_type) > > > > > > > > if (val & SOF_TIMESTAMPING_OPT_ID && > > > > !(tsflags & SOF_TIMESTAMPING_OPT_ID)) { > > > > - if (sk_is_tcp(sk)) { > > > > - if ((1 << sk->sk_state) & > > > > - (TCPF_CLOSE | TCPF_LISTEN)) > > > > - return -EINVAL; > > > > - if (val & SOF_TIMESTAMPING_OPT_ID_TCP) > > > > - atomic_set(&sk->sk_tskey, tcp_sk(sk)->write_seq); > > > > - else > > > > - atomic_set(&sk->sk_tskey, tcp_sk(sk)->snd_una); > > > > - } else { > > > > - atomic_set(&sk->sk_tskey, 0); > > > > - } > > > > + long int ret; > > > > + > > > > + ret = sock_calculate_tskey_offset(sk, val, bpf_type); > > > > + if (ret <= 0) > > > > + return ret; > > > > + > > > > + atomic_set(&sk->sk_tskey, ret); > > > > } > > > > > > > > return 0; > > > > @@ -956,10 +995,15 @@ static int sock_set_timestamping_bpf(struct sock *sk, > > > > struct so_timestamping timestamping) > > > > { > > > > u32 flags = timestamping.flags; > > > > + int ret; > > > > > > > > if (flags & ~SOF_TIMESTAMPING_BPF_SUPPPORTED_MASK) > > > > return -EINVAL; > > > > > > > > + ret = sock_set_tskey(sk, flags, 1); > > > > + if (ret) > > > > + return ret; > > > > + > > > > WRITE_ONCE(sk->sk_tsflags_bpf, flags); > > > > > > > > return 0; > > > > > > I'm a bit hazy on when this can be called. We can assume that this new > > > BPF operation cannot race with the existing setsockopt nor with the > > > datapath that might touch the atomic fields, right? > > > > It surely can race with the existing setsockopt. > > > > 1) > > if (only existing setsockopt works) { > > then sk->sk_tskey is set through setsockopt, sk_tskey_bpf_offset is 0. > > } > > > > 2) > > if (only bpf setsockopt works) { > > then sk->sk_tskey is set through bpf_setsockopt, > > sk_tskey_bpf_offset is 0. > > } > > > > 3) > > if (existing setsockopt already started, here we enable the bpf feature) { > > then sk->sk_tskey will not change, but the sk_tskey_bpf_offset > > will be calculated. > > } > > > > 4) > > if (bpf setsockopt already started, here we enable the application feature) { > > then sk->sk_tskey will re-initialized/overridden by > > setsockopt, and the sk_tskey_bpf_offset will be calculated. > > } I will copy the above to the commit message next time in order to provide a clear design to future readers. > > > > Then the skb tskey will use the sk->sk_tskey like before. > > I mean race as in the setsockopt and bpf setsockopt and datapath > running concurrently. > > As long as both variants of setsockopt hold the socket lock, that > won't happen. > > The datapath is lockless for UDP, so atomic_inc sk_tskey can race > with calculating the difference. But this is a known issue. A process > that cares should not run setsockopt and send concurrently. So this is > fine too. Oh, now I see. Thanks for the detailed explanation! So Do you feel if we need to take care of this in the future, I mean, after this series gets merged...? Thanks, Jason