Re: [PATCH net-next v2 04/12] net-timestamp: add static key to control the whole bpf extension

Jason Xing <kerneljasonxing@xxxxxxxxx> · Fri, 18 Oct 2024 11:05:18 +0800

On Fri, Oct 18, 2024 at 10:52 AM Jason Xing <kerneljasonxing@xxxxxxxxx> wrote:
>
> On Fri, Oct 18, 2024 at 4:43 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
> >
> > On 10/16/24 7:28 PM, Jason Xing wrote:
> > > On Thu, Oct 17, 2024 at 8:48 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
> > >>
> > >> On 10/16/24 3:36 AM, Jason Xing wrote:
> > >>>>> If the skb carries the timestamp, there are three cases:
> > >>>>> 1) non-bpf case and users uses setsockopt()
> > >>>>> 2) cmsg case
> > >>>>> 3) bpf case
> > >>
> > >> These should have tests in the selftests/bpf/ sooner than later. (More below).
> > >>
> > >>>>>
> > >>>>> #1 and #2 are already handled well before this patch. I only need to
> > >>>>> test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or
> > >>>>> else it could be #1 or #2, then we will let the old way print
> > >>>>> timestamps in __skb_tstamp_tx().
> > >>>>
> > >>>> hmm... I am still not sure I fully understand...but I think I may start getting it.
> > >>>
> > >>> Sorry, my bad. I gave the wrong answer...
> > >>>
> > >>> It should be:
> > >>> Testing if if sk_tsflags has SOF_TIMESTAMPING_SOFTWARE flag should
> > >>
> > >> You meant adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags()?
> > >
> > > Yep.
> > >
> > >>
> > >> Before any bpf changes, if I read __skb_tstamp_tx() correctly, the current
> > >> behavior is to just queue to the sk_error_queue as long as there is
> > >> "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it is regardless of the
> > >> sk_tsflags. This will eventually get ignored when user read it from the error
> > >> queue because the SOF_TIMESTAMPING_SOFTWARE is not set in sk_tsflags?
> > >
> > > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while
> > > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can
> > > read the skb from the errqueue but are not able to parse the
> > > timestamps. Please see
> > > tcp_recvmsg()->inet_recv_error()->ip_recv_error()->sock_recv_timestamp()->__sock_recv_timestamp():
> > > if ((tsflags & SOF_TIMESTAMPING_SOFTWARE...
> > >         ktime_to_timespec64_cond(skb->tstamp, tss.ts + 0))
> >
> > afaict, __sock_recv_timestamp does not put the timestamp cmsg but ip_recv_error
> > still returns the skb to the user.
> >
> > I suspect we are talking the same thing. When SOF_TIMESTAMPING_SOFTWARE is not
> > set in sk and SOF_TIMESTAMPING_TX_* is set in cmsg, the existing (aka
> > traditional) way is that the generated skb will still be queued in the error
> > queue. The user space can still read it but just won't have the timestamp cmsg.
>
> Apparently, we're on the same page. What you were saying is right, of course :)
>
> >
> > If I understand how the v3 may look like, the skb will not be queued in the
> > error queue at all because the sk has no SOF_TIMESTAMPING_SOFTWARE. The user
>
> Right, skb will not even be cloned or generated, let alone queue it in
> the error queue. For bpf extension, preventing skb to be queued in the
> error queue is a very vital thing.
>
> > space won't get it from the error queue which is a change of behavior. I was
> > saying I am fine but not sure if someone depends on this behavior.
>
> For bpf part, it's okay to bypass the queuing skb into the errqueue
> logic, which has been discussed at netconf before this series with
> Willem also.
> For non-bpf part, I will not touch and modify their prior behaviour.
>
> >
> > I think we start talking pass each other on this. I will wait for the code on
> > this part and the selftest first.
>
> I will keep this code snippets in V3 so that three kinds of printing
> could work parallelly:
> @@ -5601,6 +5636,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
>         if (!sk)
>                 return;
>
> +       if (static_branch_unlikely(&bpf_tstamp_control))
> +               bpf_skb_tstamp_tx_output(sk, tstype);
> +
>         skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype);
>  }
>
> So please forget what we've talked about testing the
> SOF_TIMESTAMPING_SOFTWARE flag thing.
>
> >
> > >
> > >> I suspect
> > >> the user space will still read something from the error queue unless there is
> > >> SOF_TIMESTAMPING_OPT_TSONLY but it won't have the tstamp cmsg.
> > >
> > > No, please see above.
> > >
> > >>
> > >> Adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags() will stop it
> > >> from even queuing to the error queue? I think it is ok but I am not sure if
> > >> anyone is depending on the above behavior.
> > >
> > > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING
> >
> > Got it. This part is now understood.
> >
> > It is one of the reasons for my earlier question on which SOF_* that bpf needs
> > to support. I want to simplify the naming part of the SOF_* in bpf_sesockopt but
> > lets leave these nits for a little later.
> >
> > However, it will be very useful to highlight which SOF_* will never be used in
> > bpf in v3.
>
> Got it. Will do that.
>
> >
> > > features including cmsg mode. But it will not be used in bpf mode. So
> > > the test statement is enough to divided those three cases into two
> > > groups.
> >
> > >
> > >>
> > >>> work fine. If it has the flag, we could use skb_tstamp_tx_output() to
> > >>> print based on patch [4/12]; if not, we will use
> > >>> bpf_skb_tstamp_tx_output() to print.
> > >>>
> > >>> If users use traditional ways of deploying SO_TIMESTAMPING, sk_tsflags
> > >>> always has SOF_TIMESTAMPING_SOFTWARE which is a software report flag
> > >>> (please see Documentation/networking/timestamping.rst). We can see a
> > >>> good example on how to use in
> > >>> tools/testing/selftests/net/txtimestamp.c:
> > >>> do_test()
> > >>> {
> > >>>           sock_opt = SOF_TIMESTAMPING_SOFTWARE |
> > >>>           ...
> > >>>           if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING,
> > >>>                                 (char *) &sock_opt, sizeof(sock_opt)))
> > >>> }
> > >>>
> > >>>>
> > >>>> Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once
> > >>>> it is set in patch 2? It is not a usable api tbh. It will be a surprise to many.
> > >>>> It has to be able to set and clear.
> > >>>
> > >>> I cannot find a good time to clear all the sockets which are set
> > >>> through the BPF program. If we detach the BPF program, it will not
> > >>> print of course. Does it really matter if we don't clear the
> > >>> sk_tsflags_bpf?
> > >>
> > >> Yes, it matters. The same reason goes for why the existing bpf prog can clear
> > >> the tp->bpf_sock_ops_cb_flags. Yes, detach will automatically not taking the
> > >> timestamp. For sockops program that stays forever, not all usages expect to do
> > >> timestamping for the whole lifetime of the connection. If there is a way for the
> > >> prog to turn it on, it should have a way for the prog to turn it off.
> > >
> > > I see what you meant here. If we don't clear sk_tsflags_bpf, after we
> > > detach the bpf program, it will do nothing in __skb_tstamp_tx() and
> > > return earlier. It is almost equal to the effect of turning off. It is
> > > why I don't handle clearing the flag.
> > >
> > >>
> > >> What is the concern of allowing the bpf prog to disable something that it has
> > >> enabled before?
> > >
> > > Let me give one instance:
> > > If one socket is established and stays idle, how can the bpf program
> > > clear the tsflags from that socket? I have no idea.
> >
> > bpf_tcp_iter prog can. That said, the idle connection example is too carry away
> > as an excuse that bpf_setsockopt does not need to support turning-off. Sure,
> > idle connection is as good as off. and yes, detach is as good as off also.
>
> Thanks, I see.
>
> >
> > I am now acting as a broken clock repeating myself that not all use cases run
> > for 5 mins and then detach, so I need to be specific here that bpf_setsockopt
> > not supporting off is a nack. There are many use cases in production that the
> > bpf prog runs forever and wants to turn it on-and-off.
> >
> > Again, bpf sockops prog is not the only one can bpf_setsockopt(). bpf-tcp-cc
> > that runs forever can also bpf_setsockopt to ask the sockops bpf prog to do
> > periodic timestamping when needed. bpf_tcp_iter can also bpf_setsockopt to turn
> > it off if needed.
>
> I'm not insisting not to support this. Just curious why. Now I get it.
>
> >
> > I am not asking to clear the sk_tsflags_bpf when the bpf prog is detached. I am
> > asking to support clearing the sk_tsflags_bpf in bpf_setsockopt().
>
> Yeah, I know that.
>
> >
> > I have still yet heard a technical reason why bpf_setsockopt cannot clear the bits.
>
> It's easy for me to support the function clearing the sk_tsflags_bpf
> in the bpf_setsockopt() function. Will do that :)
>
> >
> > >
> > >>
> > >> While we are on bpf_sock_ops_cb_flags, the
> > >> BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG addition is mostly a dup of whatever in
> > >> the new sk_tsflags_bpf. It is something we need to clean up later when we decide
> > >> what interface to use for bpf timestamping.
> > >
> > > I'm not sure if I understand correctly. I mimicked the use of
> > > BPF_SOCK_OPS_RTO_CB_FLAG. Do you mean we can remove the use of
> > > bpf_sock_ops_cb_flags_set() in BPF program?
> >
> > In patch 5, I meant the BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG is redundant.
> > It is as good as testing and setting sk_tsflags_bpf alone.
>
> I will remove it, then the code will be simplified.
>
> >
> > This could be some cleanup for the later stage of the set.
> >
> >  > >>
> > >>>
> > >>>>
> > >>>> Does it also mean either the bpf or the user space can enable the timetstamping
> > >>>> but not both? I don't think we can assume this also. It will be hard to deploy
> > >>>> the bpf prog in production to collect continuous data. The user space may have
> > >>>> some timestamping enabled but the bpf may want to do its parallel investigation
> > >>>> also. The user space may rollout timestamping in the future and suddenly break
> > >>>> the bpf prog.
> > >>>
> > >>> Well, IIUC, it's also the basic idea from the current series which
> > >>> allows both happening at the same time. Let us put it in a simple way,
> > >>> I hope that if the app uses the SO_TIMESTAMPING feature already, then
> > >>> one admin deploys the BPF program, that app should be traced both in
> > >>> bpf and non-bpf ways.
> > >>>
> > >>> But Willem doesn't agree about this approach[1] because of hard to debug.
> > >>>
> > >>> [1]: https://lore.kernel.org/all/670dda9437147_2e6c4029461@xxxxxxxxxxxxxxxxxxxxxx.notmuch/
> > >>> Regarding to this link, I have a few more words to say: the socket
> > >>> could be set through bpf_setsockopt() in different phases not like
> > >>> setsockopt(), so in some cases we cannot make setsockopt hard failed.
> > >>>
> > >>> After rethinking this point more, I still reckon that letting BPF
> > >>> program trace timestamping parallelly without caring whether the
> > >>> socket is set to the SO_TIMESTAMPING feature through setsockopt()
> > >>
> > >> I am afraid having both work in parallel is needed. Otherwise, it will be very
> > >> hard to deploy a bpf prog to run continuously in scale. Being able to collect
> > >> timestamp without worrying about application changes/updates/downgrades is
> > >> important. e.g. App changes from no time stamping to time stamping
> > >
> > > Sorry, I didn't make myself clear. Yesterday, I said I agreed with you
> > > :) So let me keep the current logic of printing (see the
> > > __skb_tstamp_tx() function in patch [04/12]) in the next version. Then
> > > I don't need to add some test statements to distinguish which way of
> > > printing.
> > >
> > >>
> > >> Please help to add selftests to show how the above cases (1), (2), (3), and
> > >> other tsflags/txflags sharing cases will work. This should not be delayed until
> > >> the discussion is done. It is needed sooner or later to prove both bpf and
> > >> non-bpf ways can work at the same time. It will help the reviewer and also help
> > >> to think about the design with a real use case in bpf prog.
> > >
> > > Got it. But I'm not sure where I should put those test cases? Could
> > > you help me point out a good example that I can follow?
> >
> > Have you looked at the selftests/bpf directory?
>
> Sure, the full bpf program was written based on the examples in
> selftests/bpf. But I remembered that selftests/bpf is already
> deprecated?
>
> >
> > prog_tests/setget_sockopt.c may be something closer to what you need.
> >
> > There is a recent one in the mailing list also:
> >
> > https://lore.kernel.org/all/20241016-syncookie-v1-0-3b7a0de12153@xxxxxxxxxxx/
> >
> > The expectation is to be able to run the test like this: ./test_progs -t
> > setget_sockopt
>
> Thanks for the pointer. I will spend some time studying it.
>
> >
> > >
> > >>
> > >> The example in patch 0 only prints the reported tstamp, can you share how it
> > >> will be used to investigate issue?
> > >
> > > No problem. Please see chapter 3 about "goal" in
> > > https://netdev.bots.linux.dev/netconf/2024/jason_xing.pdf.
> >
> > Thanks.
> >
> > >
> > >> Is it also useful to know when the skb is
> > >> written to the kernel during sendmsg()?
> > >
> > > You are right. Before this patch, normally applications will record an
> > > accurate timestamp before do sendmsg().
> > >
> > > After you remind me of this, I feel that we can add the timestamp
> > > print in the future for bpf use.
> >
> > Yes, please add the sendmsg timestamp capturing in the selftest. It is useful.
> >
> > >
> > >>
> > >> Regarding the bpf_setsockopt() can be called in different phase,
> > >> bpf_setsockopt() is not limited to sockops program. e.g. it can also be called
> > >> from a bpf-tcp-cc (congestion control). Not a tcp-cc expert but I won't be
> > >> surprised people will try to trigger some on-and-off timestamping from
> > >> bpf-tcp-cc to measure some delay.
> > >>
> > >>
> > >> More about bpf_setsockopt() in different phase, understand that UDP is not your
> > >> priority. However, it needs to have some clarity on how UDP will work and how to
> > >> enable it. UDP usually has no connect/established phase.
> > >
> > > For now, I don't expect an extension for UDP because it will bring too
> > > much extra work. Could we discuss this later? I mean, after we finish
> > > the basic bpf extension :)
> >
> > Later is fine but before this set lands. I am not asking a full UDP
> > implementation but need ideas on how that may look like. We need some clarity on
> > how UDP will work and also how much new sockops API extension will be needed to
> > decide if sockops is the correct one going forward. I don't want to end up tcp
> > is in sockops and UDP (and others) is non sockops.

The differences between TCP and UDP are:
1) TCP supports SOF_TIMESTAMPING_TX_ACK, SOF_TIMESTAMPING_OPT_ID_TCP,
while UDP does not.
2) the tskey works in different ways.
We have a good example to run and test:
tools/testing/selftests/net/txtimestamp.c

If that arouses your interesting, you could try:
1) for UDP, ./txtimestamp -4 -L 127.0.0.1 -l 1000  -c 2 -u
2) for TCP, ./txtimestamp -4 -L 127.0.0.1 -l 1000  -c 2

I think the V3 series could easily support the UDP protocol. Let me
try, then we could discuss more based on V3.

Thanks,
Jason