Re: [PATCH net-next v2 04/12] net-timestamp: add static key to control the whole bpf extension

Jason Xing <kerneljasonxing@xxxxxxxxx> · Fri, 18 Oct 2024 10:52:04 +0800




On Fri, Oct 18, 2024 at 4:43 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
>
> On 10/16/24 7:28 PM, Jason Xing wrote:
> > On Thu, Oct 17, 2024 at 8:48 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
> >>
> >> On 10/16/24 3:36 AM, Jason Xing wrote:
> >>>>> If the skb carries the timestamp, there are three cases:
> >>>>> 1) non-bpf case and users uses setsockopt()
> >>>>> 2) cmsg case
> >>>>> 3) bpf case
> >>
> >> These should have tests in the selftests/bpf/ sooner than later. (More below).
> >>
> >>>>>
> >>>>> #1 and #2 are already handled well before this patch. I only need to
> >>>>> test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or
> >>>>> else it could be #1 or #2, then we will let the old way print
> >>>>> timestamps in __skb_tstamp_tx().
> >>>>
> >>>> hmm... I am still not sure I fully understand...but I think I may start getting it.
> >>>
> >>> Sorry, my bad. I gave the wrong answer...
> >>>
> >>> It should be:
> >>> Testing if if sk_tsflags has SOF_TIMESTAMPING_SOFTWARE flag should
> >>
> >> You meant adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags()?
> >
> > Yep.
> >
> >>
> >> Before any bpf changes, if I read __skb_tstamp_tx() correctly, the current
> >> behavior is to just queue to the sk_error_queue as long as there is
> >> "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it is regardless of the
> >> sk_tsflags. This will eventually get ignored when user read it from the error
> >> queue because the SOF_TIMESTAMPING_SOFTWARE is not set in sk_tsflags?
> >
> > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while
> > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can
> > read the skb from the errqueue but are not able to parse the
> > timestamps. Please see
> > tcp_recvmsg()->inet_recv_error()->ip_recv_error()->sock_recv_timestamp()->__sock_recv_timestamp():
> > if ((tsflags & SOF_TIMESTAMPING_SOFTWARE...
> >         ktime_to_timespec64_cond(skb->tstamp, tss.ts + 0))
>
> afaict, __sock_recv_timestamp does not put the timestamp cmsg but ip_recv_error
> still returns the skb to the user.
>
> I suspect we are talking the same thing. When SOF_TIMESTAMPING_SOFTWARE is not
> set in sk and SOF_TIMESTAMPING_TX_* is set in cmsg, the existing (aka
> traditional) way is that the generated skb will still be queued in the error
> queue. The user space can still read it but just won't have the timestamp cmsg.

Apparently, we're on the same page. What you were saying is right, of course :)

>
> If I understand how the v3 may look like, the skb will not be queued in the
> error queue at all because the sk has no SOF_TIMESTAMPING_SOFTWARE. The user

Right, skb will not even be cloned or generated, let alone queue it in
the error queue. For bpf extension, preventing skb to be queued in the
error queue is a very vital thing.

> space won't get it from the error queue which is a change of behavior. I was
> saying I am fine but not sure if someone depends on this behavior.

For bpf part, it's okay to bypass the queuing skb into the errqueue
logic, which has been discussed at netconf before this series with
Willem also.
For non-bpf part, I will not touch and modify their prior behaviour.

>
> I think we start talking pass each other on this. I will wait for the code on
> this part and the selftest first.

I will keep this code snippets in V3 so that three kinds of printing
could work parallelly:
@@ -5601,6 +5636,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
        if (!sk)
                return;

+       if (static_branch_unlikely(&bpf_tstamp_control))
+               bpf_skb_tstamp_tx_output(sk, tstype);
+
        skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk, tstype);
 }

So please forget what we've talked about testing the
SOF_TIMESTAMPING_SOFTWARE flag thing.

>
> >
> >> I suspect
> >> the user space will still read something from the error queue unless there is
> >> SOF_TIMESTAMPING_OPT_TSONLY but it won't have the tstamp cmsg.
> >
> > No, please see above.
> >
> >>
> >> Adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags() will stop it
> >> from even queuing to the error queue? I think it is ok but I am not sure if
> >> anyone is depending on the above behavior.
> >
> > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING
>
> Got it. This part is now understood.
>
> It is one of the reasons for my earlier question on which SOF_* that bpf needs
> to support. I want to simplify the naming part of the SOF_* in bpf_sesockopt but
> lets leave these nits for a little later.
>
> However, it will be very useful to highlight which SOF_* will never be used in
> bpf in v3.

Got it. Will do that.

>
> > features including cmsg mode. But it will not be used in bpf mode. So
> > the test statement is enough to divided those three cases into two
> > groups.
>
> >
> >>
> >>> work fine. If it has the flag, we could use skb_tstamp_tx_output() to
> >>> print based on patch [4/12]; if not, we will use
> >>> bpf_skb_tstamp_tx_output() to print.
> >>>
> >>> If users use traditional ways of deploying SO_TIMESTAMPING, sk_tsflags
> >>> always has SOF_TIMESTAMPING_SOFTWARE which is a software report flag
> >>> (please see Documentation/networking/timestamping.rst). We can see a
> >>> good example on how to use in
> >>> tools/testing/selftests/net/txtimestamp.c:
> >>> do_test()
> >>> {
> >>>           sock_opt = SOF_TIMESTAMPING_SOFTWARE |
> >>>           ...
> >>>           if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING,
> >>>                                 (char *) &sock_opt, sizeof(sock_opt)))
> >>> }
> >>>
> >>>>
> >>>> Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once
> >>>> it is set in patch 2? It is not a usable api tbh. It will be a surprise to many.
> >>>> It has to be able to set and clear.
> >>>
> >>> I cannot find a good time to clear all the sockets which are set
> >>> through the BPF program. If we detach the BPF program, it will not
> >>> print of course. Does it really matter if we don't clear the
> >>> sk_tsflags_bpf?
> >>
> >> Yes, it matters. The same reason goes for why the existing bpf prog can clear
> >> the tp->bpf_sock_ops_cb_flags. Yes, detach will automatically not taking the
> >> timestamp. For sockops program that stays forever, not all usages expect to do
> >> timestamping for the whole lifetime of the connection. If there is a way for the
> >> prog to turn it on, it should have a way for the prog to turn it off.
> >
> > I see what you meant here. If we don't clear sk_tsflags_bpf, after we
> > detach the bpf program, it will do nothing in __skb_tstamp_tx() and
> > return earlier. It is almost equal to the effect of turning off. It is
> > why I don't handle clearing the flag.
> >
> >>
> >> What is the concern of allowing the bpf prog to disable something that it has
> >> enabled before?
> >
> > Let me give one instance:
> > If one socket is established and stays idle, how can the bpf program
> > clear the tsflags from that socket? I have no idea.
>
> bpf_tcp_iter prog can. That said, the idle connection example is too carry away
> as an excuse that bpf_setsockopt does not need to support turning-off. Sure,
> idle connection is as good as off. and yes, detach is as good as off also.

Thanks, I see.

>
> I am now acting as a broken clock repeating myself that not all use cases run
> for 5 mins and then detach, so I need to be specific here that bpf_setsockopt
> not supporting off is a nack. There are many use cases in production that the
> bpf prog runs forever and wants to turn it on-and-off.
>
> Again, bpf sockops prog is not the only one can bpf_setsockopt(). bpf-tcp-cc
> that runs forever can also bpf_setsockopt to ask the sockops bpf prog to do
> periodic timestamping when needed. bpf_tcp_iter can also bpf_setsockopt to turn
> it off if needed.

I'm not insisting not to support this. Just curious why. Now I get it.

>
> I am not asking to clear the sk_tsflags_bpf when the bpf prog is detached. I am
> asking to support clearing the sk_tsflags_bpf in bpf_setsockopt().

Yeah, I know that.

>
> I have still yet heard a technical reason why bpf_setsockopt cannot clear the bits.

It's easy for me to support the function clearing the sk_tsflags_bpf
in the bpf_setsockopt() function. Will do that :)

>
> >
> >>
> >> While we are on bpf_sock_ops_cb_flags, the
> >> BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG addition is mostly a dup of whatever in
> >> the new sk_tsflags_bpf. It is something we need to clean up later when we decide
> >> what interface to use for bpf timestamping.
> >
> > I'm not sure if I understand correctly. I mimicked the use of
> > BPF_SOCK_OPS_RTO_CB_FLAG. Do you mean we can remove the use of
> > bpf_sock_ops_cb_flags_set() in BPF program?
>
> In patch 5, I meant the BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG is redundant.
> It is as good as testing and setting sk_tsflags_bpf alone.

I will remove it, then the code will be simplified.

>
> This could be some cleanup for the later stage of the set.
>
>  > >>
> >>>
> >>>>
> >>>> Does it also mean either the bpf or the user space can enable the timetstamping
> >>>> but not both? I don't think we can assume this also. It will be hard to deploy
> >>>> the bpf prog in production to collect continuous data. The user space may have
> >>>> some timestamping enabled but the bpf may want to do its parallel investigation
> >>>> also. The user space may rollout timestamping in the future and suddenly break
> >>>> the bpf prog.
> >>>
> >>> Well, IIUC, it's also the basic idea from the current series which
> >>> allows both happening at the same time. Let us put it in a simple way,
> >>> I hope that if the app uses the SO_TIMESTAMPING feature already, then
> >>> one admin deploys the BPF program, that app should be traced both in
> >>> bpf and non-bpf ways.
> >>>
> >>> But Willem doesn't agree about this approach[1] because of hard to debug.
> >>>
> >>> [1]: https://lore.kernel.org/all/670dda9437147_2e6c4029461@xxxxxxxxxxxxxxxxxxxxxx.notmuch/
> >>> Regarding to this link, I have a few more words to say: the socket
> >>> could be set through bpf_setsockopt() in different phases not like
> >>> setsockopt(), so in some cases we cannot make setsockopt hard failed.
> >>>
> >>> After rethinking this point more, I still reckon that letting BPF
> >>> program trace timestamping parallelly without caring whether the
> >>> socket is set to the SO_TIMESTAMPING feature through setsockopt()
> >>
> >> I am afraid having both work in parallel is needed. Otherwise, it will be very
> >> hard to deploy a bpf prog to run continuously in scale. Being able to collect
> >> timestamp without worrying about application changes/updates/downgrades is
> >> important. e.g. App changes from no time stamping to time stamping
> >
> > Sorry, I didn't make myself clear. Yesterday, I said I agreed with you
> > :) So let me keep the current logic of printing (see the
> > __skb_tstamp_tx() function in patch [04/12]) in the next version. Then
> > I don't need to add some test statements to distinguish which way of
> > printing.
> >
> >>
> >> Please help to add selftests to show how the above cases (1), (2), (3), and
> >> other tsflags/txflags sharing cases will work. This should not be delayed until
> >> the discussion is done. It is needed sooner or later to prove both bpf and
> >> non-bpf ways can work at the same time. It will help the reviewer and also help
> >> to think about the design with a real use case in bpf prog.
> >
> > Got it. But I'm not sure where I should put those test cases? Could
> > you help me point out a good example that I can follow?
>
> Have you looked at the selftests/bpf directory?

Sure, the full bpf program was written based on the examples in
selftests/bpf. But I remembered that selftests/bpf is already
deprecated?

>
> prog_tests/setget_sockopt.c may be something closer to what you need.
>
> There is a recent one in the mailing list also:
>
> https://lore.kernel.org/all/20241016-syncookie-v1-0-3b7a0de12153@xxxxxxxxxxx/
>
> The expectation is to be able to run the test like this: ./test_progs -t
> setget_sockopt

Thanks for the pointer. I will spend some time studying it.

>
> >
> >>
> >> The example in patch 0 only prints the reported tstamp, can you share how it
> >> will be used to investigate issue?
> >
> > No problem. Please see chapter 3 about "goal" in
> > https://netdev.bots.linux.dev/netconf/2024/jason_xing.pdf.
>
> Thanks.
>
> >
> >> Is it also useful to know when the skb is
> >> written to the kernel during sendmsg()?
> >
> > You are right. Before this patch, normally applications will record an
> > accurate timestamp before do sendmsg().
> >
> > After you remind me of this, I feel that we can add the timestamp
> > print in the future for bpf use.
>
> Yes, please add the sendmsg timestamp capturing in the selftest. It is useful.
>
> >
> >>
> >> Regarding the bpf_setsockopt() can be called in different phase,
> >> bpf_setsockopt() is not limited to sockops program. e.g. it can also be called
> >> from a bpf-tcp-cc (congestion control). Not a tcp-cc expert but I won't be
> >> surprised people will try to trigger some on-and-off timestamping from
> >> bpf-tcp-cc to measure some delay.
> >>
> >>
> >> More about bpf_setsockopt() in different phase, understand that UDP is not your
> >> priority. However, it needs to have some clarity on how UDP will work and how to
> >> enable it. UDP usually has no connect/established phase.
> >
> > For now, I don't expect an extension for UDP because it will bring too
> > much extra work. Could we discuss this later? I mean, after we finish
> > the basic bpf extension :)
>
> Later is fine but before this set lands. I am not asking a full UDP
> implementation but need ideas on how that may look like. We need some clarity on
> how UDP will work and also how much new sockops API extension will be needed to
> decide if sockops is the correct one going forward. I don't want to end up tcp
> is in sockops and UDP (and others) is non sockops.

I see.

After removing the use of "BPF_SOCK_OPS_TEST_FLAG(tp,
BPF_SOCK_OPS_RX_TIMESTAMPING_OPT_CB_FLAG)", the bpf extension will be
limited only in TCP, I think.

Using sk->sk_tsflags_bpf as an indicator could work for both protocols.

Thanks for your great help :)

Thanks,
Jason