Jason Xing wrote: > On Wed, Feb 5, 2025 at 2:09 AM Jason Xing <kerneljasonxing@xxxxxxxxx> wrote: > > > > On Wed, Feb 5, 2025 at 1:08 AM Willem de Bruijn > > <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > > > > Jason Xing wrote: > > > > On Tue, Feb 4, 2025 at 9:16 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > > > > > > > > > On 1/28/25 12:46 AM, Jason Xing wrote: > > > > > > Introduce the callback to correlate tcp_sendmsg timestamp with other > > > > > > points, like SND/SW/ACK. We can let bpf trace the beginning of > > > > > > tcp_sendmsg_locked() and fetch the socket addr, so that in > > > > > > > > > > Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at > > > > > the bpf_sk_storage ...". > > > > > > > > I will revise it. Thanks. > > > > > > > > > > > > > > > tcp_tx_timestamp() we can correlate the tskey with the socket addr. > > > > > > > > > > > > > > > > It is accurate since they are under the protect of socket lock. > > > > > > More details can be found in the selftest. > > > > > > > > > > The selftest uses the bpf_sk_storage to store the sendmsg timestamp at > > > > > fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e. > > > > > BPF_SOCK_OPS_TS_SND_CB added in this patch). > > > > > > > > > > > > > > > > > Signed-off-by: Jason Xing <kerneljasonxing@xxxxxxxxx> > > > > > > --- > > > > > > include/uapi/linux/bpf.h | 7 +++++++ > > > > > > net/ipv4/tcp.c | 1 + > > > > > > tools/include/uapi/linux/bpf.h | 7 +++++++ > > > > > > 3 files changed, 15 insertions(+) > > > > > > > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > > > > index 800122a8abe5..accb3b314fff 100644 > > > > > > --- a/include/uapi/linux/bpf.h > > > > > > +++ b/include/uapi/linux/bpf.h > > > > > > @@ -7052,6 +7052,13 @@ enum { > > > > > > * when SK_BPF_CB_TX_TIMESTAMPING > > > > > > * feature is on. > > > > > > */ > > > > > > + BPF_SOCK_OPS_TS_SND_CB, /* Called when every sendmsg syscall > > > > > > + * is triggered. For TCP, it stays > > > > > > + * in the last send process to > > > > > > + * correlate with tcp_sendmsg timestamp > > > > > > + * with other timestamping callbacks, > > > > > > + * like SND/SW/ACK. > > > > > > > > > > Do you have a chance to look at how this will work at UDP? > > > > > > > > Sure, I feel like it could not be useful for UDP. Well, things get > > > > strange because I did write a long paragraph about this thing which > > > > apparently disappeared... > > > > > > > > I manage to find what I wrote: > > > > For UDP type, BPF_SOCK_OPS_TS_SND_CB may be not suitable because > > > > there are two sending process, 1) lockless path, 2) lock path, which > > > > should be handled carefully later. For the former, even though it's > > > > unlikely multiple threads access the socket to call sendmsg at the > > > > same time, I think we'd better not correlate it like what we do to the > > > > TCP case because of the lack of sock lock protection. Considering SND_CB is > > > > uapi flag, I think we don't need to forcely add the 'TCP_' prefix in > > > > case we need to use it someday. > > > > > > > > And one more thing is I'd like to use the v5[1] method in the next round > > > > to introduce a new tskey_bpf which is good for UDP type. The new field > > > > will not conflict with the tskey in shared info which is generated > > > > by sk->sk_tskey in __ip_append_data(). It hardly works if both features > > > > (so_timestamping and its bpf extension) exists at the same time. Users > > > > could get confused because sometimes they fetch the tskey from skb, > > > > sometimes they don't, especially when we have cmsg feature to turn it on/ > > > > off per sendmsg. A standalone tskey for bpf extension will be needed. > > > > With this tskey_bpf, we can easily correlate the timestamp in sendmsg > > > > syscall with other tx points(SND/SW/ACK...). > > > > > > > > [1]: https://lore.kernel.org/all/20250112113748.73504-14-kerneljasonxing@xxxxxxxxx/ > > > > > > > > If possible, we can leave this question until the UDP support series > > > > shows up. I will figure out a better solution :) > > > > > > > > In conclusion, it probably won't be used by the UDP type. It's uAPI > > > > flag so I consider the compatibility reason. > > > > > > I don't think this is acceptable. We should aim for an API that can > > > easily be used across protocols, like SO_TIMESTAMPING. > > > > After I revisit the UDP SO_TIMESTAMPING again, my thoughts are > > adjusted like below: > > > > It's hard to provide an absolutely uniform interface or usage to users > > for TCP and UDP and even more protocols. Cases can be handled one by > > one. We should try hard. SO_TIMESTAMPING is uniform across protocols. An interface that is not is just hard to use. > > The main obstacle is how we can correlate the timestamp in > > sendmsg syscall with other sending timestamps. It's worth noticing > > that for SO_TIMESTAMPING the sendmsg timestamp is collected in the > > userspace. For instance, while skb enters the qdisc, we fail to know > > which skb belongs to which sendmsg. > > > > An idea coming up is to introduce BPF_SOCK_OPS_TS_SND_CB to correlate > > the sendmsg timestamp with tskey (in tcp_tx_timestamp()) under the > > protection of socket lock + syscall as the current patch does. But for > > UDP, it can be lockless. IIUC, there is a very special case where even > > SO_TIMESTAMPING may get lost: if multiple threads accessing the same > > socket send UDP packets in parallel, then users could be confused > > which tskey matches which sendmsg. This is a known issue for lockless datagram sockets. With SO_TIMESTAMPING, but the use of timestamping and of concurrent sendmsg calls is under control of the process, so it only shoots itself in the foot. With BPF timestamping, a process may confuse a third party admin, so the situation is slightly different. > > IIUC, I will not consider this > > unlikely case, then the UDP case is quite similar to the TCP case. > > > > The scenario for the UDP case is: > > 1) using fentry bpf to hook the udp_sendmsg() to get the timestamp > > like TCP does in this series. > > 2) insert BPF_SOCK_OPS_TS_SND_CB into __ip_append_data() near the > > SO_TIMESTAMPING code snippets to let bpf program correlate the tskey > > with timestamp. > > Note: tskey in UDP will be handled carefully in a different way > > because we should support both modes for socket timestamping at the > > same time. > > It's really similar to TCP regardless of handling tskey. > > > > To be more precise in case you don't have much time to read the above > long paragraph, BPF_SOCK_OPS_TS_SND_CB is mainly used to correlate > sendmsg timestamp with corresponding tskey. > > 1. For TCP, we can correlate it in tcp_tx_timestamp() like this patch does. > > 2. For UDP, we can correlate in __ip_append_data() along with those > tskey initialization, assuming there are no multiple threads calling > locklessly ip_make_skb(). Locked path > (udp_sendmsg()->ip_append_data()) works like TCP under the socket lock > protection, so it can be easily handled. Lockless path > (udp_sendmsg()->ip_make_skb()) can be visited by multiple threads at > the same time, which should be handled properly. Different hook points is fine, as UDP (and RAW) uses __ip_append_data or more importantly ip_send_skb, while TCP uses ip_queue_xmit. As long as the API is the same: the operation (BPF_SOCK_OPS_TS_SND_CB) and the behavior of that operation. Subject to the usual distinction between protocol behavior (bytestream vs datagram). > I prefer to implement > the bpf extension for IPCORK_TS_OPT_ID, which should be another topic, > I think. This might be the only one corner case, IIUC? This sounds like an entirely different topic? Not sure what this is. > Overall I think BPF_SOCK_OPS_TS_SND_CB can work across protocols to do > the correlation job. > > To be on the safe side, I can change the name BPF_SOCK_OPS_TS_SND_CB > to BPF_SOCK_OPS_TS_TCP_SND_CB just in case this approach is not the > best one. What do you think about this? > > [1] > commit 4aecca4c76808f3736056d18ff510df80424bc9f > Author: Vadim Fedorenko <vadim.fedorenko@xxxxxxxxx> > Date: Tue Oct 1 05:57:14 2024 -0700 > > net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message > > SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX > timestamps and packets sent via socket. Unfortunately, there is no way > to reliably predict socket timestamp ID value in case of error returned > by sendmsg. For UDP sockets it's impossible because of lockless > nature of UDP transmit, several threads may send packets in parallel. In > case of RAW sockets MSG_MORE option makes things complicated. More > details are in the conversation [1]. > This patch adds new control message type to give user-space > software an opportunity to control the mapping between packets and > values by providing ID with each sendmsg for UDP sockets. > The documentation is also added in this patch. > > [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@xxxxxxxxxxxxxx/ > > Thanks, > Jason