On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@xxxxxxxxx> wrote: > > On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn > <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > > Jason Xing wrote: > > > From: Jason Xing <kernelxing@xxxxxxxxxxx> > > > > > > A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using > > > tracepoint to print information (say, tstamp) so that we can > > > transparently equip applications with this feature and require no > > > modification in user side. > > > > > > Later, we discussed at netconf and agreed that we can use bpf for better > > > extension, which is mainly suggested by John Fastabend and Willem de > > > Bruijn. Many thanks here! So I post this series to see if we have a > > > better solution to extend. > > > > > > This approach relies on existing SO_TIMESTAMPING feature, for tx path, > > > users only needs to pass certain flags through bpf program to make sure > > > the last skb from each sendmsg() has timestamp related controlled flag. > > > For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags > > > and wait for the moment when recvmsg() is called. > > > > As you mention, overall I am very supportive of having a way to add > > timestamping by adminstrators, without having to rebuild applications. > > BPF hooks seem to be the right place for this. > > > > There is existing kprobe/kretprobe/kfunc support. Supporting > > SO_TIMESTAMPING directly may be useful due to its targeted feature > > set, and correlation between measurements for the same data in the > > stream. > > > > > After this series, we could step by step implement more advanced > > > functions/flags already in SO_TIMESTAMPING feature for bpf extension. > > > > My main implementation concern is where this API overlaps with the > > existing user API, and how they might conflict. A few questions in the > > patches. > > Agreed. That's also what I'm concerned about. So I decided to ask for > related experts' help. > > How to deal with it without interfering with the existing apps in the > right way is the key problem. What I try to implement is let the bpf program have the highest precedence. It's similar to RTO min, see the commit as an example: commit f086edef71be7174a16c1ed67ac65a085cda28b1 Author: Kevin Yang <yyd@xxxxxxxxxx> Date: Mon Jun 3 21:30:54 2024 +0000 tcp: add sysctl_tcp_rto_min_us Adding a sysctl knob to allow user to specify a default rto_min at socket init time, other than using the hard coded 200ms default rto_min. Note that the rto_min route option has the highest precedence for configuring this setting, followed by the TCP_BPF_RTO_MIN socket option, followed by the tcp_rto_min_us sysctl. It includes three cases, 1) route option, 2) bpf option, 3) sysctl. The first priority can override others. It doesn't have a good chance/point to restore the icsk_rto_min field if users want to shutdown the bpf program because it is set in bpf_sol_tcp_setsockopt().