Jason Xing wrote: > On Wed, Feb 5, 2025 at 11:20 PM Willem de Bruijn > <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > > Jason Xing wrote: > > > On Wed, Feb 5, 2025 at 2:09 AM Jason Xing <kerneljasonxing@xxxxxxxxx> wrote: > > > > > > > > On Wed, Feb 5, 2025 at 1:08 AM Willem de Bruijn > > > > <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > > > > > > > > Jason Xing wrote: > > > > > > On Tue, Feb 4, 2025 at 9:16 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > > > > > > > > > > > > > On 1/28/25 12:46 AM, Jason Xing wrote: > > > > > > > > Introduce the callback to correlate tcp_sendmsg timestamp with other > > > > > > > > points, like SND/SW/ACK. We can let bpf trace the beginning of > > > > > > > > tcp_sendmsg_locked() and fetch the socket addr, so that in > > > > > > > > > > > > > > Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at > > > > > > > the bpf_sk_storage ...". > > > > > > > > > > > > I will revise it. Thanks. > > > > > > > > > > > > > > > > > > > > > tcp_tx_timestamp() we can correlate the tskey with the socket addr. > > > > > > > > > > > > > > > > > > > > > > It is accurate since they are under the protect of socket lock. > > > > > > > > More details can be found in the selftest. > > > > > > > > > > > > > > The selftest uses the bpf_sk_storage to store the sendmsg timestamp at > > > > > > > fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e. > > > > > > > BPF_SOCK_OPS_TS_SND_CB added in this patch). > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Jason Xing <kerneljasonxing@xxxxxxxxx> > > > > > > > > --- > > > > > > > > include/uapi/linux/bpf.h | 7 +++++++ > > > > > > > > net/ipv4/tcp.c | 1 + > > > > > > > > tools/include/uapi/linux/bpf.h | 7 +++++++ > > > > > > > > 3 files changed, 15 insertions(+) > > > > > > > > > > > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > > > > > > index 800122a8abe5..accb3b314fff 100644 > > > > > > > > --- a/include/uapi/linux/bpf.h > > > > > > > > +++ b/include/uapi/linux/bpf.h > > > > > > > > @@ -7052,6 +7052,13 @@ enum { > > > > > > > > * when SK_BPF_CB_TX_TIMESTAMPING > > > > > > > > * feature is on. > > > > > > > > */ > > > > > > > > + BPF_SOCK_OPS_TS_SND_CB, /* Called when every sendmsg syscall > > > > > > > > + * is triggered. For TCP, it stays > > > > > > > > + * in the last send process to > > > > > > > > + * correlate with tcp_sendmsg timestamp > > > > > > > > + * with other timestamping callbacks, > > > > > > > > + * like SND/SW/ACK. > > > > > > > > > > > > > > Do you have a chance to look at how this will work at UDP? > > > > > > > > > > > > Sure, I feel like it could not be useful for UDP. Well, things get > > > > > > strange because I did write a long paragraph about this thing which > > > > > > apparently disappeared... > > > > > > > > > > > > I manage to find what I wrote: > > > > > > For UDP type, BPF_SOCK_OPS_TS_SND_CB may be not suitable because > > > > > > there are two sending process, 1) lockless path, 2) lock path, which > > > > > > should be handled carefully later. For the former, even though it's > > > > > > unlikely multiple threads access the socket to call sendmsg at the > > > > > > same time, I think we'd better not correlate it like what we do to the > > > > > > TCP case because of the lack of sock lock protection. Considering SND_CB is > > > > > > uapi flag, I think we don't need to forcely add the 'TCP_' prefix in > > > > > > case we need to use it someday. > > > > > > > > > > > > And one more thing is I'd like to use the v5[1] method in the next round > > > > > > to introduce a new tskey_bpf which is good for UDP type. The new field > > > > > > will not conflict with the tskey in shared info which is generated > > > > > > by sk->sk_tskey in __ip_append_data(). It hardly works if both features > > > > > > (so_timestamping and its bpf extension) exists at the same time. Users > > > > > > could get confused because sometimes they fetch the tskey from skb, > > > > > > sometimes they don't, especially when we have cmsg feature to turn it on/ > > > > > > off per sendmsg. A standalone tskey for bpf extension will be needed. > > > > > > With this tskey_bpf, we can easily correlate the timestamp in sendmsg > > > > > > syscall with other tx points(SND/SW/ACK...). > > > > > > > > > > > > [1]: https://lore.kernel.org/all/20250112113748.73504-14-kerneljasonxing@xxxxxxxxx/ > > > > > > > > > > > > If possible, we can leave this question until the UDP support series > > > > > > shows up. I will figure out a better solution :) > > > > > > > > > > > > In conclusion, it probably won't be used by the UDP type. It's uAPI > > > > > > flag so I consider the compatibility reason. > > > > > > > > > > I don't think this is acceptable. We should aim for an API that can > > > > > easily be used across protocols, like SO_TIMESTAMPING. > > > > > > > > After I revisit the UDP SO_TIMESTAMPING again, my thoughts are > > > > adjusted like below: > > > > > > > > It's hard to provide an absolutely uniform interface or usage to users > > > > for TCP and UDP and even more protocols. Cases can be handled one by > > > > one. > > > > We should try hard. SO_TIMESTAMPING is uniform across protocols. > > An interface that is not is just hard to use. > > > > > > The main obstacle is how we can correlate the timestamp in > > > > sendmsg syscall with other sending timestamps. It's worth noticing > > > > that for SO_TIMESTAMPING the sendmsg timestamp is collected in the > > > > userspace. For instance, while skb enters the qdisc, we fail to know > > > > which skb belongs to which sendmsg. > > > > > > > > An idea coming up is to introduce BPF_SOCK_OPS_TS_SND_CB to correlate > > > > the sendmsg timestamp with tskey (in tcp_tx_timestamp()) under the > > > > protection of socket lock + syscall as the current patch does. But for > > > > UDP, it can be lockless. IIUC, there is a very special case where even > > > > SO_TIMESTAMPING may get lost: if multiple threads accessing the same > > > > socket send UDP packets in parallel, then users could be confused > > > > which tskey matches which sendmsg. > > > > This is a known issue for lockless datagram sockets. > > > > With SO_TIMESTAMPING, but the use of timestamping and of concurrent > > sendmsg calls is under control of the process, so it only shoots > > itself in the foot. > > > > With BPF timestamping, a process may confuse a third party admin, so > > the situation is slightly different. > > Agreed. > > > > > > > IIUC, I will not consider this > > > > unlikely case, then the UDP case is quite similar to the TCP case. > > > > > > > > The scenario for the UDP case is: > > > > 1) using fentry bpf to hook the udp_sendmsg() to get the timestamp > > > > like TCP does in this series. > > > > 2) insert BPF_SOCK_OPS_TS_SND_CB into __ip_append_data() near the > > > > SO_TIMESTAMPING code snippets to let bpf program correlate the tskey > > > > with timestamp. > > > > Note: tskey in UDP will be handled carefully in a different way > > > > because we should support both modes for socket timestamping at the > > > > same time. > > > > It's really similar to TCP regardless of handling tskey. > > > > > > > > > > To be more precise in case you don't have much time to read the above > > > long paragraph, BPF_SOCK_OPS_TS_SND_CB is mainly used to correlate > > > sendmsg timestamp with corresponding tskey. > > > > > > 1. For TCP, we can correlate it in tcp_tx_timestamp() like this patch does. > > > > > > 2. For UDP, we can correlate in __ip_append_data() along with those > > > tskey initialization, assuming there are no multiple threads calling > > > locklessly ip_make_skb(). Locked path > > > (udp_sendmsg()->ip_append_data()) works like TCP under the socket lock > > > protection, so it can be easily handled. Lockless path > > > (udp_sendmsg()->ip_make_skb()) can be visited by multiple threads at > > > the same time, which should be handled properly. > > > > Different hook points is fine, as UDP (and RAW) uses __ip_append_data > > Then this approach (introducing this new flag) is feasible. Sorry that > last night I wrote such a long paragraph which buried something > important. Because of that, I rephrase the whole idea about how to let > UDP work with this kind of new flag in [patch v8 11/12]. Link is > https://lore.kernel.org/all/CAL+tcoCmXcDot-855XYU7PKCiGvJL=O3CQBGuOTRAs2_=Ys=gg@xxxxxxxxxxxxxx/ > > > or more importantly ip_send_skb, while TCP uses ip_queue_xmit. > > For TCP, we use tcp_tx_timestamp to finish the map between sendmsg > timestamp and tskey. > > > > > As long as the API is the same: the operation (BPF_SOCK_OPS_TS_SND_CB) > > and the behavior of that operation. Subject to the usual distinction > > between protocol behavior (bytestream vs datagram). > > I see your point. > > > > > > I prefer to implement > > > the bpf extension for IPCORK_TS_OPT_ID, which should be another topic, > > > I think. This might be the only one corner case, IIUC? > > > > This sounds like an entirely different topic? Not sure what this is. > > Not really a different topic. I mean let bpf prog take the whole > control of setting the tskey, then with this BPF_SOCK_OPS_TS_SND_CB > flag we can correlate the sendmsg timestamp with tskey. So It has > something to do with the usage of UDP. Please take a look at that link > to patch 11/12. For TCP, we don't need to care about the value of > tskey which has already been taken care of by SO_TIMESTAMPING. So it > is slightly different. I'm not sure if this kind of usage is > acceptable? Why can TCP rely on SO_TIMESTAMPING to set tskey, but UDP cannot? BPF will need to set the key for both protocol if SO_TIMESTAMPING is not enabled, right?