On Mon, Oct 21, 2024 at 10:49 PM Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> wrote: > > Jason Xing wrote: > > On Mon, Oct 21, 2024 at 5:52 AM Willem de Bruijn > > <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > > > > Jason Xing wrote: > > > > From: Jason Xing <kernelxing@xxxxxxxxxxx> > > > > > > > > Willem suggested that we use a static key to control. The advantage > > > > is that we will not affect the existing applications at all if we > > > > don't load BPF program. > > > > > > > > In this patch, except the static key, I also add one logic that is > > > > used to test if the socket has enabled its tsflags in order to > > > > support bpf logic to allow both cases to happen at the same time. > > > > Or else, the skb carring related timestamp flag doesn't know which > > > > way of printing is desirable. > > > > > > > > One thing important is this patch allows print from both applications > > > > and bpf program at the same time. Now we have three kinds of print: > > > > 1) only BPF program prints > > > > 2) only application program prints > > > > 3) both can print without side effect > > > > > > > > Signed-off-by: Jason Xing <kernelxing@xxxxxxxxxxx> > > > > > > Getting back to this thread. It is long, instead of responding to > > > multiple messages, let me combine them in a single response. > > > > Thank you so much! > > > > > > > > > > > * On future extensions: > > > > > > +1 that the UDP case, and datagrams more broadly, must have a clear > > > development path, before we can merge TCP. > > > > > > Similarly, hardware timestamps need not be supported from the start, > > > but must clearly be supportable. > > > > Agreed. Using the standalone sk_tsflags_bpf and tskey_bpf and removing > > the TCP bpf test logic(say, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG) > > could work well for both protos. Let me give it a try first. > > Great, thanks. > > > > > > > > > > * On queueing packets to userspace: > > > > > > > > the current behavior is to just queue to the sk_error_queue as long > > > > > as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it > > > > > is regardless of the sk_tsflags. " > > > > > > > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while > > > > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can > > > > read the skb from the errqueue but are not able to parse the > > > > timestamps > > > > Above is what I tried to explain how the application timestamping > > feature works, not what I tried to implement for the BPF extension. > > > > > > > > Before queuing a packet to userspace on the error queue, the relevant > > > reporting flag is always tested. sock_recv_timestamp has: > > > > > > /* > > > * generate control messages if > > > * - receive time stamping in software requested > > > * - software time stamp available and wanted > > > * - hardware time stamps available and wanted > > > */ > > > if (sock_flag(sk, SOCK_RCVTSTAMP) || > > > (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) || > > > (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) || > > > (hwtstamps->hwtstamp && > > > (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE))) > > > __sock_recv_timestamp(msg, sk, skb); > > > > > > Otherwise applications could get error messages queued, and > > > epoll/poll/select would unexpectedly behave differently. > > > > Right. And I have no intention to use the SOF_TIMESTAMPING_SOFTWARE > > flag for BPF. > > Can you elaborate on this? This sounds like it would go against the > intent to have the two versions of the API (application and BPF) be > equivalent. Oh, I see what you mean here. I have no preference. Well, I can add this report flag into the BPF extension like how application timestamping works. > > > > > > > > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING > > > > features including cmsg mode. But it will not be used in bpf mode. > > > > > > For simplicity, the two uses of the API are best kept identical. If > > > there is a technical reason why BPF has to diverge from established > > > behavior, this needs to be explicitly called out in the commit > > > message. > > > > > > Also, if you want to extend the API for BPF in the future, good to > > > call this out now and ideally extensions will apply to both, to > > > maintain a uniform API. > > > > As you said, I also agree on "two uses of the API are best kept identical". > > > > > > > > > > > * On extra measurement points, at sendmsg or tcp_write_xmit: > > > > > > The first is interesting. For application timestamping, this was > > > never needed, as the application can just call clock_gettime before > > > sendmsg. > > > > Yes, we could add it after we finish the current series. I'm going to > > write it down on my todo list. > > > > > > > > In general, additional measurement points are not only useful if the > > > interval between is not constant. So far, we have seen no need for > > > any additional points. > > > > Taking a snapshot of tcp_write_xmit() could be useful especially when > > the skb is not transmitted due to nagle algorithm. > > > > > > > > > > > * On skb state: > > > > > > > > For now, is there thing we can explore to share in the skb_shared_info? > > > > > > skb_shinfo space is at a premium. I don't think we can justify two > > > extra fields just for this use case. > > > > > > > My initial thought is just to reuse these fields in skb. It can work > > > > without interfering one another. > > > > > > I'm skeptical that two methods can work at the same time. If they are > > > started at different times, their sk_tskey will be different, for one. > > > > Right, sk_tskey is the only special one that I will take care of. > > Others like tx_flags or txstamp_ack from struct tcp_skb_cb can be > > reused. > > > > > > > > There may be workarounds. Maybe BPF can store its state in some BPF > > > specific field, indeed. Or perhaps it can store per-sk shadow state > > > that resolves the conflict. For instance, the offset between sk_tskey > > > and bpf_tskey. > > > > Things could get complicated in the future if we want to unified the > > final tskey value for all the cases. Since 1) the value of > > shinfo->tskey depends on skb seq and len, 2) the final tskey output is > > the diff between sk_tskey and shinfo->tskey, can I add a bpf_tskey in > > struct sock and related output logic for bpf without caring if it's > > the same as sk_tskey. > > I think we can add fields to struct sock without too much concern. > Adding fields to sk_buff or skb_shared_info would be more difficult. Got it:) > > > That said, the outputs from two methods differ. Do you think it is > > acceptable? It could be simpler and easier if we keep them identical. > > Since we can only have one skb_shared_info.tskey, if both user and bpf > request OPT_ID, starting at different times, then we will have two > bases against which to compute the difference. Having two fields in > struct sock should suffice. Exactly! I will do it. Thanks, Jason