On Wed, Oct 16, 2024 at 1:35 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > On 10/15/24 6:24 PM, Jason Xing wrote: > > On Wed, Oct 16, 2024 at 9:01 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > >> > >> On 10/11/24 9:06 PM, Jason Xing wrote: > >>> From: Jason Xing <kernelxing@xxxxxxxxxxx> > >>> > >>> Introduce BPF_SOCK_OPS_TS_SCHED_OPT_CB flag so that we can decide to > >>> print timestamps when the skb just passes the dev layer. > >>> > >>> Signed-off-by: Jason Xing <kernelxing@xxxxxxxxxxx> > >>> --- > >>> include/uapi/linux/bpf.h | 5 +++++ > >>> net/core/skbuff.c | 17 +++++++++++++++-- > >>> tools/include/uapi/linux/bpf.h | 5 +++++ > >>> 3 files changed, 25 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > >>> index 157e139ed6fc..3cf3c9c896c7 100644 > >>> --- a/include/uapi/linux/bpf.h > >>> +++ b/include/uapi/linux/bpf.h > >>> @@ -7019,6 +7019,11 @@ enum { > >>> * by the kernel or the > >>> * earlier bpf-progs. > >>> */ > >>> + BPF_SOCK_OPS_TS_SCHED_OPT_CB, /* Called when skb is passing through > >>> + * dev layer when SO_TIMESTAMPING > >>> + * feature is on. It indicates the > >>> + * recorded timestamp. > >>> + */ > >>> }; > >>> > >>> /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect > >>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c > >>> index 3a4110d0f983..16e7bdc1eacb 100644 > >>> --- a/net/core/skbuff.c > >>> +++ b/net/core/skbuff.c > >>> @@ -5632,8 +5632,21 @@ static void bpf_skb_tstamp_tx_output(struct sock *sk, int tstype) > >>> return; > >>> > >>> tp = tcp_sk(sk); > >>> - if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG)) > >>> - return; > >>> + if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG)) { > >>> + struct timespec64 tstamp; > >>> + u32 cb_flag; > >>> + > >>> + switch (tstype) { > >>> + case SCM_TSTAMP_SCHED: > >>> + cb_flag = BPF_SOCK_OPS_TS_SCHED_OPT_CB; > >>> + break; > >>> + default: > >>> + return; > >>> + } > >>> + > >>> + tstamp = ktime_to_timespec64(ktime_get_real()); > >>> + tcp_call_bpf_2arg(sk, cb_flag, tstamp.tv_sec, tstamp.tv_nsec); > >> > >> There is bpf_ktime_get_*() helper. The bpf prog can directly call the > >> bpf_ktime_get_* helper and use whatever clock it sees fit instead of enforcing > >> real clock here and doing an extra ktime_to_timespec64. Right now the > >> bpf_ktime_get_*() does not have real clock which I think it can be added. > > > > In this way, there is no need to add tcp_call_bpf_*arg() to pass > > timestamp to userspace, right? Let the bpf program implement it. > > > > Now I wonder what information I should pass? Sorry for the lack of BPF > > related knowledge :( > > Just pass the cb_flag op in this case. I see. I saw one example just passing a NULL parameter: tcp_call_bpf(sk, BPF_SOCK_OPS_BASE_RTT, 0, NULL);. > > A bpf selftest is missing in this series to show how it is going to be used. Sorry, I didn't implement a standard selftest, but I wrote a full BPF program in patch[0/12]. I planned to write a selftests after every expert agrees the current approach. > Yes, there are existing socket API tests on time stamping but I believe this > discussion has already shown some subtle differences that warrant a closer to > real world bpf prog example first. > > > > >> > >> I think overall the tstamp reporting interface does not necessarily have to > >> follow the socket API. The bpf prog is running in the kernel. It could pass > >> other information to the bpf prog if it sees fit. e.g. the bpf prog could also > >> get the original transmitted tcp skb if it is useful. > > > > Good to know that! But how the BPF program parses the skb by using > > tcp_call_bpf_2arg() which only passes u32 parameters. > > "struct skbuff *skb" has already been added to "struct bpf_sock_ops_kern". It is > only assigned during the "BPF_SOCK_OPS_PARSE_*HDR_CB". It is not exposed > directly to bpf prog but it could be. However, it may need to change some > convert_ctx code in filter.c which I am not excited about. We haven't added > convert_ctx changes for a while since it is the old way. > > Together with the "u32 bpf_sock_ops_cb_flags;" change in patch 9 which is only > for tcp_sock and other _CB flags are also tcp specific only. For now, I am not Right, the first move I made is to make TCP work. > sure carrying this sockops to the future UDP support is desired. I hope so. But it's not an urgent thing that needs to be done recently. > > Take a look at tcp_call_bpf(). It needs to initialize the whole "struct > bpf_sock_ops_kern" regardless of what the bpf prog is needed before calling the > bpf prog. The "u32 args[4]" is one of them. The is the older way of using bpf to > extend kernel. I see. > > bpf has struct_ops support now which can pass only what is needed and without > the need of doing the convert_ctx in filter.c. The "struct tcp_congestion_ops" > can already be implemented in bpf. Take a look at > selftests/bpf/progs/bpf_cubic.c. All the BPF_SOCK_OPS_*_CB (e.g. > BPF_SOCK_OPS_TS_SCHED_OPT_CB here) could just a "ops" in the struct_ops. Interesting, but it seems this way is much more complex than the current approach. > > That said, I think the first thing needs to figure out is how to enable bpf time > stamping without having side effect on the user space. In the next version, I will avoid affecting the cmsg case, so no more side effects I think. > Continue the sockops approach first I'm a little hesitant to do so because it looks like we will introduce more codes. Please let me investigate more :) > and use it to create a selftest bpf prog example. Then we can decide. I copy the BPF program from patch [0/12], please take a look and help me review this: --- Here is the test output: 1) receive path iperf3-987305 [008] ...11 179955.200990: bpf_trace_printk: rx: port: 5201:55192, swtimestamp: 1728167973,670426346, hwtimestamp: 0,0 2) xmit path iperf3-19765 [013] ...11 2021.329602: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436678584 iperf3-19765 [013] b..11 2021.329611: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436689976 iperf3-19765 [013] ...11 2021.329622: bpf_trace_printk: tx: port: 47528:5201, key: 1036, timestamp: 1728357067,436700739 Here is the full bpf program: #include <linux/bpf.h> #include <bpf/bpf_helpers.h> #include <bpf/bpf_endian.h> #include <uapi/linux/net_tstamp.h> int _version SEC("version") = 1; char _license[] SEC("license") = "GPL"; # define SO_TIMESTAMPING 37 __section("sockops") int set_initial_rto(struct bpf_sock_ops *skops) { int op = (int) skops->op; u32 sport = 0, dport = 0; int flags; switch (op) { //case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB: case BPF_SOCK_OPS_TCP_CONNECT_CB: case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: flags = SOF_TIMESTAMPING_RX_SOFTWARE | SOF_TIMESTAMPING_TX_SCHED | SOF_TIMESTAMPING_TX_ACK | SOF_TIMESTAMPING_TX_SOFTWARE | SOF_TIMESTAMPING_OPT_ID | SOF_TIMESTAMPING_OPT_ID_TCP; bpf_setsockopt(skops, SOL_SOCKET, SO_TIMESTAMPING, &flags, sizeof(flags)); bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG|BPF_SOCK_OPS_RX_TIMESTAMPING_OPT_CB_FLAG); break; case BPF_SOCK_OPS_TS_SCHED_OPT_CB: case BPF_SOCK_OPS_TS_SW_OPT_CB: case BPF_SOCK_OPS_TS_ACK_OPT_CB: dport = bpf_ntohl(skops->remote_port); sport = skops->local_port; bpf_printk("tx: port: %u:%u, key: %u, timestamp: %u,%u\n", sport, dport, skops->args[0], skops->args[1], skops->args[2]); break; case BPF_SOCK_OPS_TS_RX_OPT_CB: dport = bpf_ntohl(skops->remote_port); sport = skops->local_port; bpf_printk("rx: port: %u:%u, swtimestamp: %u,%u, hwtimestamp: %u,%u\n", sport, dport, skops->args[0], skops->args[1], skops->args[2], skops->args[3]); break; } return 1; } --- What is your opinion on the above? Thanks, Jason