Martin KaFai Lau wrote: > On 10/20/24 2:51 PM, Willem de Bruijn wrote: > > Jason Xing wrote: > >> From: Jason Xing <kernelxing@xxxxxxxxxxx> > >> > >> Willem suggested that we use a static key to control. The advantage > >> is that we will not affect the existing applications at all if we > >> don't load BPF program. > >> > >> In this patch, except the static key, I also add one logic that is > >> used to test if the socket has enabled its tsflags in order to > >> support bpf logic to allow both cases to happen at the same time. > >> Or else, the skb carring related timestamp flag doesn't know which > >> way of printing is desirable. > >> > >> One thing important is this patch allows print from both applications > >> and bpf program at the same time. Now we have three kinds of print: > >> 1) only BPF program prints > >> 2) only application program prints > >> 3) both can print without side effect > >> > >> Signed-off-by: Jason Xing <kernelxing@xxxxxxxxxxx> > > > > Getting back to this thread. It is long, instead of responding to > > multiple messages, let me combine them in a single response. > > > > > > * On future extensions: > > > > +1 that the UDP case, and datagrams more broadly, must have a clear > > development path, before we can merge TCP. > > > > Similarly, hardware timestamps need not be supported from the start, > > but must clearly be supportable. > > > > > > * On queueing packets to userspace: > > > >>> the current behavior is to just queue to the sk_error_queue as long > >>> as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it > >>> is regardless of the sk_tsflags. " > > > >> Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while > >> SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can > >> read the skb from the errqueue but are not able to parse the > >> timestamps > > > > Before queuing a packet to userspace on the error queue, the relevant > > reporting flag is always tested. sock_recv_timestamp has: > > > > /* > > * generate control messages if > > * - receive time stamping in software requested > > * - software time stamp available and wanted > > * - hardware time stamps available and wanted > > */ > > if (sock_flag(sk, SOCK_RCVTSTAMP) || > > (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) || > > (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) || > > (hwtstamps->hwtstamp && > > (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE))) > > __sock_recv_timestamp(msg, sk, skb); > > > > Otherwise applications could get error messages queued, and > > epoll/poll/select would unexpectedly behave differently. > > I just tried the following diff to remove setsockopt from txtimestamp.c and run > "./txtimestamp -6 -c 1 -C -N -L ::1". It is getting the skb from the error queue > with only cmsg flag. That it surprising and against the API intent as I understand it. Let me reproduce and take a closer look. > I did a printk in __skb_tstamp_tx to ensure the > sk->sk_tsflags is empty also. > > diff --git i/tools/testing/selftests/net/txtimestamp.c > w/tools/testing/selftests/net/txtimestamp.c > index dae91eb97d69..5d9d2773b076 100644 > --- i/tools/testing/selftests/net/txtimestamp.c > +++ w/tools/testing/selftests/net/txtimestamp.c > @@ -319,6 +319,8 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int > payload_len) > for (cm = CMSG_FIRSTHDR(msg); > cm && cm->cmsg_len; > cm = CMSG_NXTHDR(msg, cm)) { > + printf("cm->cmsg_level %d cm->cmsg_type %d\n", > + cm->cmsg_level, cm->cmsg_type); > if (cm->cmsg_level == SOL_SOCKET && > cm->cmsg_type == SCM_TIMESTAMPING) { > tss = (void *) CMSG_DATA(cm); > @@ -362,7 +364,7 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int > payload_len) > if (batch > 1) { > fprintf(stderr, "batched %d timestamps\n", batch); > } else if (!batch) { > - fprintf(stderr, "Failed to report timestamps\n"); > + fprintf(stderr, "Failed to report timestamps. payload_len %d\n", payload_len); > test_failed = true; > } > } > @@ -578,9 +580,12 @@ static void do_test(int family, unsigned int report_opt) > if (cfg_loop_nodata) > sock_opt |= SOF_TIMESTAMPING_OPT_TSONLY; > > + (void)sock_opt; > +/* > if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, > (char *) &sock_opt, sizeof(sock_opt))) > error(1, 0, "setsockopt timestamping"); > +*/ > > for (i = 0; i < cfg_num_pkts; i++) { > memset(&msg, 0, sizeof(msg)); > > > >> SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING > >> features including cmsg mode. But it will not be used in bpf mode. > > > > For simplicity, the two uses of the API are best kept identical. If > > there is a technical reason why BPF has to diverge from established > > behavior, this needs to be explicitly called out in the commit > > message. > > SOF_TIMESTAMPING_OPT_TSONLY will not be supported. The orig_skb can always be > passed directly to the bpf if needed without extra cost. The same probably goes > for SOF_TIMESTAMPING_OPT_PKTINFO. SOF_TIMESTAMPING_SOFTWARE does not seem to be > useful either. I think only a subset of SOF_* will be supported, probably only > the TX_* and RX_* ones. > > > > > Also, if you want to extend the API for BPF in the future, good to > > call this out now and ideally extensions will apply to both, to > > maintain a uniform API. > > > > > > * On extra measurement points, at sendmsg or tcp_write_xmit: > > > > The first is interesting. For application timestamping, this was > > never needed, as the application can just call clock_gettime before > > sendmsg. > > > > In general, additional measurement points are not only useful if the > > interval between is not constant. So far, we have seen no need for > > any additional points. > > > > > > * On skb state: > > > >>> For now, is there thing we can explore to share in the skb_shared_info? > > > > skb_shinfo space is at a premium. I don't think we can justify two > > extra fields just for this use case. > > > >> My initial thought is just to reuse these fields in skb. It can work > >> without interfering one another. > > > > I'm skeptical that two methods can work at the same time. If they are > > started at different times, their sk_tskey will be different, for one. > > For the skb's tx_flags, Jason seems to be able to figure out by only using the > new sk_tsflags_bpf. In the worst case, it seems there is still one bit left in > tx_flags. > > I am also not very positive on the skb's tskey for now. > > Willem, I recalled I had tried to reuse the tx_flags and hwtstamp when keeping > the delivery time in skb->tstamp for a skb redirecting from egress to ingress. I > think that approach was stalled because the tx_flags could be changed by the > netdevice like "skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS". How about the > skb_shinfo(skb)->hwtstamps? At least for the TX path, it should not be changed > until the netdevice calling skb_tstamp_tx() to report the hwtstamp? or the clone > in the tcp stack will still break things if the hwtstamps is reused for other > purpose? True. I think on Tx hwtstamps is only used on the path from the driver tx completion handler to when it calls skb_tstamp_tx. It does not even really have to be an skb field. The first driver cscope happens to point me to indeed just allocates it on the stack: tsnep_tx_poll. > > > > There may be workarounds. Maybe BPF can store its state in some BPF > > specific field, indeed. Or perhaps it can store per-sk shadow state > > that resolves the conflict. For instance, the offset between sk_tskey > > and bpf_tskey. > > I have also been proposing to explore other way for the key since bpf has direct > access to the skb (also the sk, bpf prog can store data in the sk). > > The bpf prog can learn what is the seq_no of the egress-ing skb. When the ack > comes back, it can also learn the ack seq no. Does it help? It will be harder to > use because it probably needs to store this info in the bpf map (or in the bpf > sk storage). However, if it needs to learn the timestamp at the > tcp_sendmsg/tcp_transmit_skb/tcp_write_xmit, this timestamp has to be stored > somewhere also. Either in a bpf map or in a bpf sk storage. > > SEC("cgroup/setsockopt") prog can also enforce the user space setsockopt. e.g. > it can add SOF_TIMESTAMPING_OPT_ID_TCP when user space only use > SOF_TIMESTAMPING_OPT_ID.