On Thu, Oct 17, 2024 at 8:48 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > On 10/16/24 3:36 AM, Jason Xing wrote: > >>> If the skb carries the timestamp, there are three cases: > >>> 1) non-bpf case and users uses setsockopt() > >>> 2) cmsg case > >>> 3) bpf case > > These should have tests in the selftests/bpf/ sooner than later. (More below). > > >>> > >>> #1 and #2 are already handled well before this patch. I only need to > >>> test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or > >>> else it could be #1 or #2, then we will let the old way print > >>> timestamps in __skb_tstamp_tx(). > >> > >> hmm... I am still not sure I fully understand...but I think I may start getting it. > > > > Sorry, my bad. I gave the wrong answer... > > > > It should be: > > Testing if if sk_tsflags has SOF_TIMESTAMPING_SOFTWARE flag should > > You meant adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags()? Yep. > > Before any bpf changes, if I read __skb_tstamp_tx() correctly, the current > behavior is to just queue to the sk_error_queue as long as there is > "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it is regardless of the > sk_tsflags. This will eventually get ignored when user read it from the error > queue because the SOF_TIMESTAMPING_SOFTWARE is not set in sk_tsflags? Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can read the skb from the errqueue but are not able to parse the timestamps. Please see tcp_recvmsg()->inet_recv_error()->ip_recv_error()->sock_recv_timestamp()->__sock_recv_timestamp(): if ((tsflags & SOF_TIMESTAMPING_SOFTWARE... ktime_to_timespec64_cond(skb->tstamp, tss.ts + 0)) > I suspect > the user space will still read something from the error queue unless there is > SOF_TIMESTAMPING_OPT_TSONLY but it won't have the tstamp cmsg. No, please see above. > > Adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags() will stop it > from even queuing to the error queue? I think it is ok but I am not sure if > anyone is depending on the above behavior. SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING features including cmsg mode. But it will not be used in bpf mode. So the test statement is enough to divided those three cases into two groups. > > > work fine. If it has the flag, we could use skb_tstamp_tx_output() to > > print based on patch [4/12]; if not, we will use > > bpf_skb_tstamp_tx_output() to print. > > > > If users use traditional ways of deploying SO_TIMESTAMPING, sk_tsflags > > always has SOF_TIMESTAMPING_SOFTWARE which is a software report flag > > (please see Documentation/networking/timestamping.rst). We can see a > > good example on how to use in > > tools/testing/selftests/net/txtimestamp.c: > > do_test() > > { > > sock_opt = SOF_TIMESTAMPING_SOFTWARE | > > ... > > if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, > > (char *) &sock_opt, sizeof(sock_opt))) > > } > > > >> > >> Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once > >> it is set in patch 2? It is not a usable api tbh. It will be a surprise to many. > >> It has to be able to set and clear. > > > > I cannot find a good time to clear all the sockets which are set > > through the BPF program. If we detach the BPF program, it will not > > print of course. Does it really matter if we don't clear the > > sk_tsflags_bpf? > > Yes, it matters. The same reason goes for why the existing bpf prog can clear > the tp->bpf_sock_ops_cb_flags. Yes, detach will automatically not taking the > timestamp. For sockops program that stays forever, not all usages expect to do > timestamping for the whole lifetime of the connection. If there is a way for the > prog to turn it on, it should have a way for the prog to turn it off. I see what you meant here. If we don't clear sk_tsflags_bpf, after we detach the bpf program, it will do nothing in __skb_tstamp_tx() and return earlier. It is almost equal to the effect of turning off. It is why I don't handle clearing the flag. > > What is the concern of allowing the bpf prog to disable something that it has > enabled before? Let me give one instance: If one socket is established and stays idle, how can the bpf program clear the tsflags from that socket? I have no idea. > > While we are on bpf_sock_ops_cb_flags, the > BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG addition is mostly a dup of whatever in > the new sk_tsflags_bpf. It is something we need to clean up later when we decide > what interface to use for bpf timestamping. I'm not sure if I understand correctly. I mimicked the use of BPF_SOCK_OPS_RTO_CB_FLAG. Do you mean we can remove the use of bpf_sock_ops_cb_flags_set() in BPF program? > > > > >> > >> Does it also mean either the bpf or the user space can enable the timetstamping > >> but not both? I don't think we can assume this also. It will be hard to deploy > >> the bpf prog in production to collect continuous data. The user space may have > >> some timestamping enabled but the bpf may want to do its parallel investigation > >> also. The user space may rollout timestamping in the future and suddenly break > >> the bpf prog. > > > > Well, IIUC, it's also the basic idea from the current series which > > allows both happening at the same time. Let us put it in a simple way, > > I hope that if the app uses the SO_TIMESTAMPING feature already, then > > one admin deploys the BPF program, that app should be traced both in > > bpf and non-bpf ways. > > > > But Willem doesn't agree about this approach[1] because of hard to debug. > > > > [1]: https://lore.kernel.org/all/670dda9437147_2e6c4029461@xxxxxxxxxxxxxxxxxxxxxx.notmuch/ > > Regarding to this link, I have a few more words to say: the socket > > could be set through bpf_setsockopt() in different phases not like > > setsockopt(), so in some cases we cannot make setsockopt hard failed. > > > > After rethinking this point more, I still reckon that letting BPF > > program trace timestamping parallelly without caring whether the > > socket is set to the SO_TIMESTAMPING feature through setsockopt() > > I am afraid having both work in parallel is needed. Otherwise, it will be very > hard to deploy a bpf prog to run continuously in scale. Being able to collect > timestamp without worrying about application changes/updates/downgrades is > important. e.g. App changes from no time stamping to time stamping Sorry, I didn't make myself clear. Yesterday, I said I agreed with you :) So let me keep the current logic of printing (see the __skb_tstamp_tx() function in patch [04/12]) in the next version. Then I don't need to add some test statements to distinguish which way of printing. > > Please help to add selftests to show how the above cases (1), (2), (3), and > other tsflags/txflags sharing cases will work. This should not be delayed until > the discussion is done. It is needed sooner or later to prove both bpf and > non-bpf ways can work at the same time. It will help the reviewer and also help > to think about the design with a real use case in bpf prog. Got it. But I'm not sure where I should put those test cases? Could you help me point out a good example that I can follow? > > The example in patch 0 only prints the reported tstamp, can you share how it > will be used to investigate issue? No problem. Please see chapter 3 about "goal" in https://netdev.bots.linux.dev/netconf/2024/jason_xing.pdf. > Is it also useful to know when the skb is > written to the kernel during sendmsg()? You are right. Before this patch, normally applications will record an accurate timestamp before do sendmsg(). After you remind me of this, I feel that we can add the timestamp print in the future for bpf use. > > Regarding the bpf_setsockopt() can be called in different phase, > bpf_setsockopt() is not limited to sockops program. e.g. it can also be called > from a bpf-tcp-cc (congestion control). Not a tcp-cc expert but I won't be > surprised people will try to trigger some on-and-off timestamping from > bpf-tcp-cc to measure some delay. > > > More about bpf_setsockopt() in different phase, understand that UDP is not your > priority. However, it needs to have some clarity on how UDP will work and how to > enable it. UDP usually has no connect/established phase. For now, I don't expect an extension for UDP because it will bring too much extra work. Could we discuss this later? I mean, after we finish the basic bpf extension :) > > Regarding the SOF_TIMESTAMPING_* support, can you list out what else you are > planning to support in the future. You mentioned the SOF_TIMESTAMPING_TX_ACK in > another thread. What else? In this patch series, I support SOF_TIMESTAMPING_TX_SCHED|SOF_TIMESTAMPING_TX_ACK|SOF_TIMESTAMPING_TX_SOFTWARE, which you've already noticed from the BPF example in patch [0/12]. They all come from the original design of SO_TIMESTAMPING feature. The question you proposed is what I am willing to implement in the future, like adding one hook in tcp_write_xmit()? It's part of my plans to extend in the future, not be included in this series. Thanks for your review. Thanks, Jason