Re: [PATCH net-next v2 04/12] net-timestamp: add static key to control the whole bpf extension

Jason Xing <kerneljasonxing@xxxxxxxxx> · Mon, 21 Oct 2024 23:05:46 +0800

On Mon, Oct 21, 2024 at 10:49 PM Willem de Bruijn
<willemdebruijn.kernel@xxxxxxxxx> wrote:
>
> Jason Xing wrote:
> > On Mon, Oct 21, 2024 at 5:52 AM Willem de Bruijn
> > <willemdebruijn.kernel@xxxxxxxxx> wrote:
> > >
> > > Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@xxxxxxxxxxx>
> > > >
> > > > Willem suggested that we use a static key to control. The advantage
> > > > is that we will not affect the existing applications at all if we
> > > > don't load BPF program.
> > > >
> > > > In this patch, except the static key, I also add one logic that is
> > > > used to test if the socket has enabled its tsflags in order to
> > > > support bpf logic to allow both cases to happen at the same time.
> > > > Or else, the skb carring related timestamp flag doesn't know which
> > > > way of printing is desirable.
> > > >
> > > > One thing important is this patch allows print from both applications
> > > > and bpf program at the same time. Now we have three kinds of print:
> > > > 1) only BPF program prints
> > > > 2) only application program prints
> > > > 3) both can print without side effect
> > > >
> > > > Signed-off-by: Jason Xing <kernelxing@xxxxxxxxxxx>
> > >
> > > Getting back to this thread. It is long, instead of responding to
> > > multiple messages, let me combine them in a single response.
> >
> > Thank you so much!
> >
> > >
> > >
> > > * On future extensions:
> > >
> > > +1 that the UDP case, and datagrams more broadly, must have a clear
> > > development path, before we can merge TCP.
> > >
> > > Similarly, hardware timestamps need not be supported from the start,
> > > but must clearly be supportable.
> >
> > Agreed. Using the standalone sk_tsflags_bpf and tskey_bpf and removing
> > the TCP bpf test logic(say, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG)
> > could work well for both protos. Let me give it a try first.
>
> Great, thanks.
>
> > >
> > >
> > > * On queueing packets to userspace:
> > >
> > > > > the current behavior is to just queue to the sk_error_queue as long
> > > > > as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it
> > > > > is regardless of the sk_tsflags. "
> > >
> > > > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while
> > > > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can
> > > > read the skb from the errqueue but are not able to parse the
> > > > timestamps
> >
> > Above is what I tried to explain how the application timestamping
> > feature works, not what I tried to implement for the BPF extension.
> >
> > >
> > > Before queuing a packet to userspace on the error queue, the relevant
> > > reporting flag is always tested. sock_recv_timestamp has:
> > >
> > >         /*
> > >          * generate control messages if
> > >          * - receive time stamping in software requested
> > >          * - software time stamp available and wanted
> > >          * - hardware time stamps available and wanted
> > >          */
> > >         if (sock_flag(sk, SOCK_RCVTSTAMP) ||
> > >             (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) ||
> > >             (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) ||
> > >             (hwtstamps->hwtstamp &&
> > >              (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)))
> > >                 __sock_recv_timestamp(msg, sk, skb);
> > >
> > > Otherwise applications could get error messages queued, and
> > > epoll/poll/select would unexpectedly behave differently.
> >
> > Right. And I have no intention to use the SOF_TIMESTAMPING_SOFTWARE
> > flag for BPF.
>
> Can you elaborate on this? This sounds like it would go against the
> intent to have the two versions of the API (application and BPF) be
> equivalent.

Oh, I see what you mean here. I have no preference. Well, I can add
this report flag into the BPF extension like how application
timestamping works.

>
> > >
> > > > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING
> > > > features including cmsg mode. But it will not be used in bpf mode.
> > >
> > > For simplicity, the two uses of the API are best kept identical. If
> > > there is a technical reason why BPF has to diverge from established
> > > behavior, this needs to be explicitly called out in the commit
> > > message.
> > >
> > > Also, if you want to extend the API for BPF in the future, good to
> > > call this out now and ideally extensions will apply to both, to
> > > maintain a uniform API.
> >
> > As you said, I also agree on "two uses of the API are best kept identical".
> >
> > >
> > >
> > > * On extra measurement points, at sendmsg or tcp_write_xmit:
> > >
> > > The first is interesting. For application timestamping, this was
> > > never needed, as the application can just call clock_gettime before
> > > sendmsg.
> >
> > Yes, we could add it after we finish the current series. I'm going to
> > write it down on my todo list.
> >
> > >
> > > In general, additional measurement points are not only useful if the
> > > interval between is not constant. So far, we have seen no need for
> > > any additional points.
> >
> > Taking a snapshot of tcp_write_xmit() could be useful especially when
> > the skb is not transmitted due to nagle algorithm.
> >
> > >
> > >
> > > * On skb state:
> > >
> > > > > For now, is there thing we can explore to share in the skb_shared_info?
> > >
> > > skb_shinfo space is at a premium. I don't think we can justify two
> > > extra fields just for this use case.
> > >
> > > > My initial thought is just to reuse these fields in skb. It can work
> > > > without interfering one another.
> > >
> > > I'm skeptical that two methods can work at the same time. If they are
> > > started at different times, their sk_tskey will be different, for one.
> >
> > Right, sk_tskey is the only special one that I will take care of.
> > Others like tx_flags or txstamp_ack from struct tcp_skb_cb can be
> > reused.
> >
> > >
> > > There may be workarounds. Maybe BPF can store its state in some BPF
> > > specific field, indeed. Or perhaps it can store per-sk shadow state
> > > that resolves the conflict. For instance, the offset between sk_tskey
> > > and bpf_tskey.
> >
> > Things could get complicated in the future if we want to unified the
> > final tskey value for all the cases. Since 1) the value of
> > shinfo->tskey depends on skb seq and len, 2) the final tskey output is
> > the diff between sk_tskey and shinfo->tskey, can I add a bpf_tskey in
> > struct sock and related output logic for bpf without caring if it's
> > the same as sk_tskey.
>
> I think we can add fields to struct sock without too much concern.
> Adding fields to sk_buff or skb_shared_info would be more difficult.

Got it:)

>
> > That said, the outputs from two methods differ. Do you think it is
> > acceptable? It could be simpler and easier if we keep them identical.
>
> Since we can only have one skb_shared_info.tskey, if both user and bpf
> request OPT_ID, starting at different times, then we will have two
> bases against which to compute the difference. Having two fields in
> struct sock should suffice.

Exactly! I will do it.

Thanks,
Jason