Jason Xing wrote: > On Mon, Oct 21, 2024 at 5:52 AM Willem de Bruijn > <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > > Jason Xing wrote: > > > From: Jason Xing <kernelxing@xxxxxxxxxxx> > > > > > > Willem suggested that we use a static key to control. The advantage > > > is that we will not affect the existing applications at all if we > > > don't load BPF program. > > > > > > In this patch, except the static key, I also add one logic that is > > > used to test if the socket has enabled its tsflags in order to > > > support bpf logic to allow both cases to happen at the same time. > > > Or else, the skb carring related timestamp flag doesn't know which > > > way of printing is desirable. > > > > > > One thing important is this patch allows print from both applications > > > and bpf program at the same time. Now we have three kinds of print: > > > 1) only BPF program prints > > > 2) only application program prints > > > 3) both can print without side effect > > > > > > Signed-off-by: Jason Xing <kernelxing@xxxxxxxxxxx> > > > > Getting back to this thread. It is long, instead of responding to > > multiple messages, let me combine them in a single response. > > Thank you so much! > > > > > > > * On future extensions: > > > > +1 that the UDP case, and datagrams more broadly, must have a clear > > development path, before we can merge TCP. > > > > Similarly, hardware timestamps need not be supported from the start, > > but must clearly be supportable. > > Agreed. Using the standalone sk_tsflags_bpf and tskey_bpf and removing > the TCP bpf test logic(say, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG) > could work well for both protos. Let me give it a try first. Great, thanks. > > > > > > * On queueing packets to userspace: > > > > > > the current behavior is to just queue to the sk_error_queue as long > > > > as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it > > > > is regardless of the sk_tsflags. " > > > > > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while > > > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can > > > read the skb from the errqueue but are not able to parse the > > > timestamps > > Above is what I tried to explain how the application timestamping > feature works, not what I tried to implement for the BPF extension. > > > > > Before queuing a packet to userspace on the error queue, the relevant > > reporting flag is always tested. sock_recv_timestamp has: > > > > /* > > * generate control messages if > > * - receive time stamping in software requested > > * - software time stamp available and wanted > > * - hardware time stamps available and wanted > > */ > > if (sock_flag(sk, SOCK_RCVTSTAMP) || > > (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) || > > (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) || > > (hwtstamps->hwtstamp && > > (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE))) > > __sock_recv_timestamp(msg, sk, skb); > > > > Otherwise applications could get error messages queued, and > > epoll/poll/select would unexpectedly behave differently. > > Right. And I have no intention to use the SOF_TIMESTAMPING_SOFTWARE > flag for BPF. Can you elaborate on this? This sounds like it would go against the intent to have the two versions of the API (application and BPF) be equivalent. > > > > > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING > > > features including cmsg mode. But it will not be used in bpf mode. > > > > For simplicity, the two uses of the API are best kept identical. If > > there is a technical reason why BPF has to diverge from established > > behavior, this needs to be explicitly called out in the commit > > message. > > > > Also, if you want to extend the API for BPF in the future, good to > > call this out now and ideally extensions will apply to both, to > > maintain a uniform API. > > As you said, I also agree on "two uses of the API are best kept identical". > > > > > > > * On extra measurement points, at sendmsg or tcp_write_xmit: > > > > The first is interesting. For application timestamping, this was > > never needed, as the application can just call clock_gettime before > > sendmsg. > > Yes, we could add it after we finish the current series. I'm going to > write it down on my todo list. > > > > > In general, additional measurement points are not only useful if the > > interval between is not constant. So far, we have seen no need for > > any additional points. > > Taking a snapshot of tcp_write_xmit() could be useful especially when > the skb is not transmitted due to nagle algorithm. > > > > > > > * On skb state: > > > > > > For now, is there thing we can explore to share in the skb_shared_info? > > > > skb_shinfo space is at a premium. I don't think we can justify two > > extra fields just for this use case. > > > > > My initial thought is just to reuse these fields in skb. It can work > > > without interfering one another. > > > > I'm skeptical that two methods can work at the same time. If they are > > started at different times, their sk_tskey will be different, for one. > > Right, sk_tskey is the only special one that I will take care of. > Others like tx_flags or txstamp_ack from struct tcp_skb_cb can be > reused. > > > > > There may be workarounds. Maybe BPF can store its state in some BPF > > specific field, indeed. Or perhaps it can store per-sk shadow state > > that resolves the conflict. For instance, the offset between sk_tskey > > and bpf_tskey. > > Things could get complicated in the future if we want to unified the > final tskey value for all the cases. Since 1) the value of > shinfo->tskey depends on skb seq and len, 2) the final tskey output is > the diff between sk_tskey and shinfo->tskey, can I add a bpf_tskey in > struct sock and related output logic for bpf without caring if it's > the same as sk_tskey. I think we can add fields to struct sock without too much concern. Adding fields to sk_buff or skb_shared_info would be more difficult. > That said, the outputs from two methods differ. Do you think it is > acceptable? It could be simpler and easier if we keep them identical. Since we can only have one skb_shared_info.tskey, if both user and bpf request OPT_ID, starting at different times, then we will have two bases against which to compute the difference. Having two fields in struct sock should suffice.