On Thu, Oct 31, 2024 at 10:41 AM Jason Xing <kerneljasonxing@xxxxxxxxx> wrote: > > On Thu, Oct 31, 2024 at 9:17 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > > > On 10/29/24 11:50 PM, Jason Xing wrote: > > > On Wed, Oct 30, 2024 at 1:42 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > >> > > >> On 10/28/24 4:05 AM, Jason Xing wrote: > > >>> +/* Used to track the tskey for bpf extension > > >>> + * > > >>> + * @sk_tskey: bpf extension can use it only when no application uses. > > >>> + * Application can use it directly regardless of bpf extension. > > >>> + * > > >>> + * There are three strategies: > > >>> + * 1) If we've already set through setsockopt() and here we're going to set > > >>> + * OPT_ID for bpf use, we will not re-initialize the @sk_tskey and will > > >>> + * keep the record of delta between the current "key" and previous key. > > >>> + * 2) If we've already set through bpf_setsockopt() and here we're going to > > >>> + * set for application use, we will record the delta first and then > > >>> + * override/initialize the @sk_tskey. > > >>> + * 3) other cases, which means only either of them takes effect, so initialize > > >>> + * everything simplely. > > >>> + */ > > >>> +static long int sock_calculate_tskey_offset(struct sock *sk, int val, int bpf_type) > > >>> +{ > > >>> + u32 tskey; > > >>> + > > >>> + if (sk_is_tcp(sk)) { > > >>> + if ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)) > > >>> + return -EINVAL; > > >>> + > > >>> + if (val & SOF_TIMESTAMPING_OPT_ID_TCP) > > >>> + tskey = tcp_sk(sk)->write_seq; > > >>> + else > > >>> + tskey = tcp_sk(sk)->snd_una; > > >>> + } else { > > >>> + tskey = 0; > > >>> + } > > >>> + > > >>> + if (bpf_type && (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) { > > >>> + sk->sk_tskey_bpf_offset = tskey - atomic_read(&sk->sk_tskey); > > >>> + return 0; > > >>> + } else if (!bpf_type && (sk->sk_tsflags_bpf & SOF_TIMESTAMPING_OPT_ID)) { > > >>> + sk->sk_tskey_bpf_offset = atomic_read(&sk->sk_tskey) - tskey; > > >>> + } else { > > >>> + sk->sk_tskey_bpf_offset = 0; > > >>> + } > > >>> + > > >>> + return tskey; > > >>> +} > > >> > > >> Before diving into this route, the bpf prog can peek into the tcp seq no in the > > >> skb. It can also look at the sk->sk_tskey for UDP socket. Can you explain why > > >> those are not enough information for the bpf prog? > > > > > > Well, it does make sense. It seems we don't need to implement tskey > > > for this bpf feature... > > > > > > Due to lack of enough knowledge of bpf, could you provide more hints > > > that I can follow to write a bpf program to print more information > > > from the skb? Like in the last patch of this series, in > > > tools/testing/selftests/bpf/prog_tests/so_timestamping.c, do we have a > > > feasible way to do that? > > > > The bpf-prog@sendmsg() will be run to capture a timestamp for sendmsg(). > > When running the bpf-prog@sendmsg(), the skb can be set to the "struct > > bpf_sock_ops_kern sock_ops;" which is passed to the sockops prog. Take a look at > > bpf_skops_write_hdr_opt(). > > Thanks. I see the skb field in struct bpf_sock_ops_kern. > > > > > bpf prog cannot directly access the skops->skb now. It is because the sockops > > prog sees the uapi "struct bpf_sock_ops" instead of "struct > > bpf_sock_ops(_kern)". The conversion is done in sock_ops_convert_ctx_access. It > > is an old way before BTF. I don't want to extend the uapi "struct bpf_sock_ops". > > Oh, so it seems we cannot use this way, right? > > I also noticed a use case that allow users to get the information from one skb: > "int BPF_PROG(trace_netif_receive_skb, struct sk_buff *skb)" in > tools/testing/selftests/bpf/progs/netif_receive_skb.c > But it requires us to add the tracepoint in __skb_tstamp_tx() first. > Two months ago, I was planning to use a tracepoint for some people who > find it difficult to deploy bpf. > > > > > Instead, use bpf_cast_to_kern_ctx((struct bpf_sock_ops *)skops_ctx) to get a > > trusted "struct bpf_sock_ops(_kern) *skops" pointer. Then it can access the > > skops->skb. > > Let me spend some time on it. Thanks. > > > afaik, the tcb->seq should be available already during sendmsg. it > > should be able to get it from TCP_SKB_CB(skb)->seq with the bpf_core_cast. Take > > a look at the existing examples of bpf_core_cast. > > > > The same goes for the skb->data. It can use the bpf_dynptr_from_skb(). It is not > > available to skops program now but should be easy to expose. > > I wonder what the use of skb->data is here. > > > > > The bpf prog wants to calculate the delay between [sendmsg, SCHED], [SCHED, > > SND], [SND, ACK]. It is why (at least in my mental model) a key is needed to > > co-relate the sendmsg, SCHED, SND, and ACK timestamp. The tcp seqno could be > > served as that key. > > > > All that said, while looking at tcp_tx_timestamp() again, there is always > > "shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;". shinfo->tskey can be > > used directly as-is by the bpf prog. I think now I am missing why the bpf prog > > needs the sk_tskey in the sk? > > As you said, tcp seqno could be treated as the key, but it leaks the > information in TCP layer to users. Please see the commit: > commit 4ed2d765dfaccff5ebdac68e2064b59125033a3b > Author: Willem de Bruijn <willemb@xxxxxxxxxx> > Date: Mon Aug 4 22:11:49 2014 -0400 > > net-timestamp: TCP timestamping > ... > - To avoid leaking the absolute seqno to userspace, the offset > returned in ee_data must always be relative. It is an offset between > an skb and sk field. > > It has to be computed in the kernel before reporting to the user space, I think. Well, I'm thinking since the BPF program can only be used by _admin_, we will not take any risk even if the raw seq is exported to the BPF program. Willem, I would like to know your opinions about this point (about whether we can export the raw seqno or not). Thanks. Thanks, Jason