On Wed, Oct 9, 2024 at 9:16 PM Vadim Fedorenko <vadim.fedorenko@xxxxxxxxx> wrote: > > On 09/10/2024 12:48, Jason Xing wrote: > > On Wed, Oct 9, 2024 at 7:12 PM Jason Xing <kerneljasonxing@xxxxxxxxx> wrote: > >> > >> On Wed, Oct 9, 2024 at 5:28 PM Vadim Fedorenko > >> <vadim.fedorenko@xxxxxxxxx> wrote: > >>> > >>> On 09/10/2024 02:05, Jason Xing wrote: > >>>> On Wed, Oct 9, 2024 at 7:22 AM Jason Xing <kerneljasonxing@xxxxxxxxx> wrote: > >>>>> > >>>>> On Wed, Oct 9, 2024 at 2:44 AM Willem de Bruijn > >>>>> <willemdebruijn.kernel@xxxxxxxxx> wrote: > >>>>>> > >>>>>> Jason Xing wrote: > >>>>>>> From: Jason Xing <kernelxing@xxxxxxxxxxx> > >>>>>>> > >>>>>>> A few weeks ago, I planned to extend SO_TIMESTMAMPING feature by using > >>>>>>> tracepoint to print information (say, tstamp) so that we can > >>>>>>> transparently equip applications with this feature and require no > >>>>>>> modification in user side. > >>>>>>> > >>>>>>> Later, we discussed at netconf and agreed that we can use bpf for better > >>>>>>> extension, which is mainly suggested by John Fastabend and Willem de > >>>>>>> Bruijn. Many thanks here! So I post this series to see if we have a > >>>>>>> better solution to extend. > >>>>>>> > >>>>>>> This approach relies on existing SO_TIMESTAMPING feature, for tx path, > >>>>>>> users only needs to pass certain flags through bpf program to make sure > >>>>>>> the last skb from each sendmsg() has timestamp related controlled flag. > >>>>>>> For rx path, we have to use bpf_setsockopt() to set the sk->sk_tsflags > >>>>>>> and wait for the moment when recvmsg() is called. > >>>>>> > >>>>>> As you mention, overall I am very supportive of having a way to add > >>>>>> timestamping by adminstrators, without having to rebuild applications. > >>>>>> BPF hooks seem to be the right place for this. > >>>>>> > >>>>>> There is existing kprobe/kretprobe/kfunc support. Supporting > >>>>>> SO_TIMESTAMPING directly may be useful due to its targeted feature > >>>>>> set, and correlation between measurements for the same data in the > >>>>>> stream. > >>>>>> > >>>>>>> After this series, we could step by step implement more advanced > >>>>>>> functions/flags already in SO_TIMESTAMPING feature for bpf extension. > >>>>>> > >>>>>> My main implementation concern is where this API overlaps with the > >>>>>> existing user API, and how they might conflict. A few questions in the > >>>>>> patches. > >>>>> > >>>>> Agreed. That's also what I'm concerned about. So I decided to ask for > >>>>> related experts' help. > >>>>> > >>>>> How to deal with it without interfering with the existing apps in the > >>>>> right way is the key problem. > >>>> > >>>> What I try to implement is let the bpf program have the highest > >>>> precedence. It's similar to RTO min, see the commit as an example: > >>>> > >>>> commit f086edef71be7174a16c1ed67ac65a085cda28b1 > >>>> Author: Kevin Yang <yyd@xxxxxxxxxx> > >>>> Date: Mon Jun 3 21:30:54 2024 +0000 > >>>> > >>>> tcp: add sysctl_tcp_rto_min_us > >>>> > >>>> Adding a sysctl knob to allow user to specify a default > >>>> rto_min at socket init time, other than using the hard > >>>> coded 200ms default rto_min. > >>>> > >>>> Note that the rto_min route option has the highest precedence > >>>> for configuring this setting, followed by the TCP_BPF_RTO_MIN > >>>> socket option, followed by the tcp_rto_min_us sysctl. > >>>> > >>>> It includes three cases, 1) route option, 2) bpf option, 3) sysctl. > >>>> The first priority can override others. It doesn't have a good > >>>> chance/point to restore the icsk_rto_min field if users want to > >>>> shutdown the bpf program because it is set in > >>>> bpf_sol_tcp_setsockopt(). > >>> > >>> rto_min example is slightly different. With tcp_rto_min the doesn't > >>> expect any data to come back to user space while for timestamping the > >>> app may be confused directly by providing more data, or by not providing > >>> expected data. I believe some hint about requestor of the data is needed > >>> here. It will also help to solve the problem of populating sk_err_queue > >>> mentioned by Martin. > >> > >> Sorry, I don't fully get it. In this patch series, this bpf extension > >> feature will not rely on sk_err_queue any more to report tx timestamps > >> to userspace. Bpf program can do that printing. > >> > >> Do you mean that it could be wrong if one skb carries the tsflags that > >> are previously set due to the bpf program and then suddenly users > >> detach the program? It indeed will put a new/cloned skb into the error > >> queue. Interesting corner case. It seems I have to re-implement a > >> totally independent tsflags for bpf extension feature. Do you have a > >> better idea on this? > > > > I feel that if I could introduce bpf new flags like > > SOF_TIMESTAMPING_TX_ACK_BPF for the last skb based on this patch > > series, then it will not populate skb in sk_err_queue even users > > remove the bpf program all of sudden. With this kind of specific bpf > > flags, we can also avoid conflicting with the apps using > > SO_TIEMSTAMPING feature. Let me give it a shot unless a better > > solution shows up. > > It doesn't look great to have duplicate flags just to indicate that this > particular timestamp was asked by a bpf program, even though it looks Or introduce a new field in struct sock or struct sk_buff so that existing SOF_TIMESTAMPING_* can be reused. > like a straight forward solution. Sounds like we have to re-think the > interface for timestamping requests, but I don't have proper suggestion > right now. Thanks for your help :) Thanks, Jason