On Fri, Oct 18, 2024 at 10:52 AM Jason Xing <kerneljasonxing@xxxxxxxxx> wrote: > > On Fri, Oct 18, 2024 at 4:43 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > > > On 10/16/24 7:28 PM, Jason Xing wrote: > > > On Thu, Oct 17, 2024 at 8:48 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > >> > > >> On 10/16/24 3:36 AM, Jason Xing wrote: > > >>>>> If the skb carries the timestamp, there are three cases: > > >>>>> 1) non-bpf case and users uses setsockopt() > > >>>>> 2) cmsg case > > >>>>> 3) bpf case > > >> > > >> These should have tests in the selftests/bpf/ sooner than later. (More below). > > >> > > >>>>> > > >>>>> #1 and #2 are already handled well before this patch. I only need to > > >>>>> test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or > > >>>>> else it could be #1 or #2, then we will let the old way print > > >>>>> timestamps in __skb_tstamp_tx(). > > >>>> > > >>>> hmm... I am still not sure I fully understand...but I think I may start getting it. > > >>> > > >>> Sorry, my bad. I gave the wrong answer... > > >>> > > >>> It should be: > > >>> Testing if if sk_tsflags has SOF_TIMESTAMPING_SOFTWARE flag should > > >> > > >> You meant adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags()? > > > > > > Yep. > > > > > >> > > >> Before any bpf changes, if I read __skb_tstamp_tx() correctly, the current > > >> behavior is to just queue to the sk_error_queue as long as there is > > >> "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it is regardless of the > > >> sk_tsflags. This will eventually get ignored when user read it from the error > > >> queue because the SOF_TIMESTAMPING_SOFTWARE is not set in sk_tsflags? > > > > > > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while > > > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can > > > read the skb from the errqueue but are not able to parse the > > > timestamps. Please see > > > tcp_recvmsg()->inet_recv_error()->ip_recv_error()->sock_recv_timestamp()->__sock_recv_timestamp(): > > > if ((tsflags & SOF_TIMESTAMPING_SOFTWARE... > > > ktime_to_timespec64_cond(skb->tstamp, tss.ts + 0)) > > > > afaict, __sock_recv_timestamp does not put the timestamp cmsg but ip_recv_error > > still returns the skb to the user. > > > > I suspect we are talking the same thing. When SOF_TIMESTAMPING_SOFTWARE is not > > set in sk and SOF_TIMESTAMPING_TX_* is set in cmsg, the existing (aka > > traditional) way is that the generated skb will still be queued in the error > > queue. The user space can still read it but just won't have the timestamp cmsg. > > Apparently, we're on the same page. What you were saying is right, of course :) > > > > > If I understand how the v3 may look like, the skb will not be queued in the > > error queue at all because the sk has no SOF_TIMESTAMPING_SOFTWARE. The user > > Right, skb will not even be cloned or generated, let alone queue it in > the error queue. For bpf extension, preventing skb to be queued in the > error queue is a very vital thing. > > > space won't get it from the error queue which is a change of behavior. I was > > saying I am fine but not sure if someone depends on this behavior. > > For bpf part, it's okay to bypass the queuing skb into the errqueue > logic, which has been discussed at netconf before this series with > Willem also. > For non-bpf part, I will not touch and modify their prior behaviour. > > > > > I think we start talking pass each other on this. I will wait for the code on > > this part and the selftest first. > > I will keep this code snippets in V3 so that three kinds of printing > could work parallelly: > @@ -5601,6 +5636,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb, > if (!sk) > return; > > + if (static_branch_unlikely(&bpf_tstamp_control)) > + bpf_skb_tstamp_tx_output(sk, tstype); > + > skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype); > } > > So please forget what we've talked about testing the > SOF_TIMESTAMPING_SOFTWARE flag thing. > > > > > > > > >> I suspect > > >> the user space will still read something from the error queue unless there is > > >> SOF_TIMESTAMPING_OPT_TSONLY but it won't have the tstamp cmsg. > > > > > > No, please see above. > > > > > >> > > >> Adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags() will stop it > > >> from even queuing to the error queue? I think it is ok but I am not sure if > > >> anyone is depending on the above behavior. > > > > > > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING > > > > Got it. This part is now understood. > > > > It is one of the reasons for my earlier question on which SOF_* that bpf needs > > to support. I want to simplify the naming part of the SOF_* in bpf_sesockopt but > > lets leave these nits for a little later. > > > > However, it will be very useful to highlight which SOF_* will never be used in > > bpf in v3. > > Got it. Will do that. > > > > > > features including cmsg mode. But it will not be used in bpf mode. So > > > the test statement is enough to divided those three cases into two > > > groups. > > > > > > > >> > > >>> work fine. If it has the flag, we could use skb_tstamp_tx_output() to > > >>> print based on patch [4/12]; if not, we will use > > >>> bpf_skb_tstamp_tx_output() to print. > > >>> > > >>> If users use traditional ways of deploying SO_TIMESTAMPING, sk_tsflags > > >>> always has SOF_TIMESTAMPING_SOFTWARE which is a software report flag > > >>> (please see Documentation/networking/timestamping.rst). We can see a > > >>> good example on how to use in > > >>> tools/testing/selftests/net/txtimestamp.c: > > >>> do_test() > > >>> { > > >>> sock_opt = SOF_TIMESTAMPING_SOFTWARE | > > >>> ... > > >>> if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, > > >>> (char *) &sock_opt, sizeof(sock_opt))) > > >>> } > > >>> > > >>>> > > >>>> Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once > > >>>> it is set in patch 2? It is not a usable api tbh. It will be a surprise to many. > > >>>> It has to be able to set and clear. > > >>> > > >>> I cannot find a good time to clear all the sockets which are set > > >>> through the BPF program. If we detach the BPF program, it will not > > >>> print of course. Does it really matter if we don't clear the > > >>> sk_tsflags_bpf? > > >> > > >> Yes, it matters. The same reason goes for why the existing bpf prog can clear > > >> the tp->bpf_sock_ops_cb_flags. Yes, detach will automatically not taking the > > >> timestamp. For sockops program that stays forever, not all usages expect to do > > >> timestamping for the whole lifetime of the connection. If there is a way for the > > >> prog to turn it on, it should have a way for the prog to turn it off. > > > > > > I see what you meant here. If we don't clear sk_tsflags_bpf, after we > > > detach the bpf program, it will do nothing in __skb_tstamp_tx() and > > > return earlier. It is almost equal to the effect of turning off. It is > > > why I don't handle clearing the flag. > > > > > >> > > >> What is the concern of allowing the bpf prog to disable something that it has > > >> enabled before? > > > > > > Let me give one instance: > > > If one socket is established and stays idle, how can the bpf program > > > clear the tsflags from that socket? I have no idea. > > > > bpf_tcp_iter prog can. That said, the idle connection example is too carry away > > as an excuse that bpf_setsockopt does not need to support turning-off. Sure, > > idle connection is as good as off. and yes, detach is as good as off also. > > Thanks, I see. > > > > > I am now acting as a broken clock repeating myself that not all use cases run > > for 5 mins and then detach, so I need to be specific here that bpf_setsockopt > > not supporting off is a nack. There are many use cases in production that the > > bpf prog runs forever and wants to turn it on-and-off. > > > > Again, bpf sockops prog is not the only one can bpf_setsockopt(). bpf-tcp-cc > > that runs forever can also bpf_setsockopt to ask the sockops bpf prog to do > > periodic timestamping when needed. bpf_tcp_iter can also bpf_setsockopt to turn > > it off if needed. > > I'm not insisting not to support this. Just curious why. Now I get it. > > > > > I am not asking to clear the sk_tsflags_bpf when the bpf prog is detached. I am > > asking to support clearing the sk_tsflags_bpf in bpf_setsockopt(). > > Yeah, I know that. > > > > > I have still yet heard a technical reason why bpf_setsockopt cannot clear the bits. > > It's easy for me to support the function clearing the sk_tsflags_bpf > in the bpf_setsockopt() function. Will do that :) > > > > > > > > >> > > >> While we are on bpf_sock_ops_cb_flags, the > > >> BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG addition is mostly a dup of whatever in > > >> the new sk_tsflags_bpf. It is something we need to clean up later when we decide > > >> what interface to use for bpf timestamping. > > > > > > I'm not sure if I understand correctly. I mimicked the use of > > > BPF_SOCK_OPS_RTO_CB_FLAG. Do you mean we can remove the use of > > > bpf_sock_ops_cb_flags_set() in BPF program? > > > > In patch 5, I meant the BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG is redundant. > > It is as good as testing and setting sk_tsflags_bpf alone. > > I will remove it, then the code will be simplified. > > > > > This could be some cleanup for the later stage of the set. > > > > > >> > > >>> > > >>>> > > >>>> Does it also mean either the bpf or the user space can enable the timetstamping > > >>>> but not both? I don't think we can assume this also. It will be hard to deploy > > >>>> the bpf prog in production to collect continuous data. The user space may have > > >>>> some timestamping enabled but the bpf may want to do its parallel investigation > > >>>> also. The user space may rollout timestamping in the future and suddenly break > > >>>> the bpf prog. > > >>> > > >>> Well, IIUC, it's also the basic idea from the current series which > > >>> allows both happening at the same time. Let us put it in a simple way, > > >>> I hope that if the app uses the SO_TIMESTAMPING feature already, then > > >>> one admin deploys the BPF program, that app should be traced both in > > >>> bpf and non-bpf ways. > > >>> > > >>> But Willem doesn't agree about this approach[1] because of hard to debug. > > >>> > > >>> [1]: https://lore.kernel.org/all/670dda9437147_2e6c4029461@xxxxxxxxxxxxxxxxxxxxxx.notmuch/ > > >>> Regarding to this link, I have a few more words to say: the socket > > >>> could be set through bpf_setsockopt() in different phases not like > > >>> setsockopt(), so in some cases we cannot make setsockopt hard failed. > > >>> > > >>> After rethinking this point more, I still reckon that letting BPF > > >>> program trace timestamping parallelly without caring whether the > > >>> socket is set to the SO_TIMESTAMPING feature through setsockopt() > > >> > > >> I am afraid having both work in parallel is needed. Otherwise, it will be very > > >> hard to deploy a bpf prog to run continuously in scale. Being able to collect > > >> timestamp without worrying about application changes/updates/downgrades is > > >> important. e.g. App changes from no time stamping to time stamping > > > > > > Sorry, I didn't make myself clear. Yesterday, I said I agreed with you > > > :) So let me keep the current logic of printing (see the > > > __skb_tstamp_tx() function in patch [04/12]) in the next version. Then > > > I don't need to add some test statements to distinguish which way of > > > printing. > > > > > >> > > >> Please help to add selftests to show how the above cases (1), (2), (3), and > > >> other tsflags/txflags sharing cases will work. This should not be delayed until > > >> the discussion is done. It is needed sooner or later to prove both bpf and > > >> non-bpf ways can work at the same time. It will help the reviewer and also help > > >> to think about the design with a real use case in bpf prog. > > > > > > Got it. But I'm not sure where I should put those test cases? Could > > > you help me point out a good example that I can follow? > > > > Have you looked at the selftests/bpf directory? > > Sure, the full bpf program was written based on the examples in > selftests/bpf. But I remembered that selftests/bpf is already > deprecated? > > > > > prog_tests/setget_sockopt.c may be something closer to what you need. > > > > There is a recent one in the mailing list also: > > > > https://lore.kernel.org/all/20241016-syncookie-v1-0-3b7a0de12153@xxxxxxxxxxx/ > > > > The expectation is to be able to run the test like this: ./test_progs -t > > setget_sockopt > > Thanks for the pointer. I will spend some time studying it. > > > > > > > > >> > > >> The example in patch 0 only prints the reported tstamp, can you share how it > > >> will be used to investigate issue? > > > > > > No problem. Please see chapter 3 about "goal" in > > > https://netdev.bots.linux.dev/netconf/2024/jason_xing.pdf. > > > > Thanks. > > > > > > > >> Is it also useful to know when the skb is > > >> written to the kernel during sendmsg()? > > > > > > You are right. Before this patch, normally applications will record an > > > accurate timestamp before do sendmsg(). > > > > > > After you remind me of this, I feel that we can add the timestamp > > > print in the future for bpf use. > > > > Yes, please add the sendmsg timestamp capturing in the selftest. It is useful. > > > > > > > >> > > >> Regarding the bpf_setsockopt() can be called in different phase, > > >> bpf_setsockopt() is not limited to sockops program. e.g. it can also be called > > >> from a bpf-tcp-cc (congestion control). Not a tcp-cc expert but I won't be > > >> surprised people will try to trigger some on-and-off timestamping from > > >> bpf-tcp-cc to measure some delay. > > >> > > >> > > >> More about bpf_setsockopt() in different phase, understand that UDP is not your > > >> priority. However, it needs to have some clarity on how UDP will work and how to > > >> enable it. UDP usually has no connect/established phase. > > > > > > For now, I don't expect an extension for UDP because it will bring too > > > much extra work. Could we discuss this later? I mean, after we finish > > > the basic bpf extension :) > > > > Later is fine but before this set lands. I am not asking a full UDP > > implementation but need ideas on how that may look like. We need some clarity on > > how UDP will work and also how much new sockops API extension will be needed to > > decide if sockops is the correct one going forward. I don't want to end up tcp > > is in sockops and UDP (and others) is non sockops. The differences between TCP and UDP are: 1) TCP supports SOF_TIMESTAMPING_TX_ACK, SOF_TIMESTAMPING_OPT_ID_TCP, while UDP does not. 2) the tskey works in different ways. We have a good example to run and test: tools/testing/selftests/net/txtimestamp.c If that arouses your interesting, you could try: 1) for UDP, ./txtimestamp -4 -L 127.0.0.1 -l 1000 -c 2 -u 2) for TCP, ./txtimestamp -4 -L 127.0.0.1 -l 1000 -c 2 I think the V3 series could easily support the UDP protocol. Let me try, then we could discuss more based on V3. Thanks, Jason