Re: [PATCH net-next v3 02/14] net-timestamp: allow two features to work parallelly

Jason Xing <kerneljasonxing@xxxxxxxxx> · Thu, 31 Oct 2024 07:54:57 +0800

On Thu, Oct 31, 2024 at 1:15 AM Willem de Bruijn
<willemdebruijn.kernel@xxxxxxxxx> wrote:
>
> Jason Xing wrote:
> > On Wed, Oct 30, 2024 at 1:37 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
> > >
> > > On 10/29/24 8:04 PM, Jason Xing wrote:
> > > >>>>>>>    static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
> > > >>>>>>>                                 const struct sk_buff *ack_skb,
> > > >>>>>>>                                 struct skb_shared_hwtstamps *hwtstamps,
> > > >>>>>>> @@ -5549,6 +5575,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
> > > >>>>>>>        u32 tsflags;
> > > >>>>>>>
> > > >>>>>>>        tsflags = READ_ONCE(sk->sk_tsflags);
> > > >>>>>>> +     if (!sk_tstamp_tx_flags(sk, tsflags, tstype))
> > > >>>>>>
> > > >>>>>> I still don't get this part since v2. How does it work with cmsg only
> > > >>>>>> SOF_TIMESTAMPING_TX_*?
> > > >>>>>>
> > > >>>>>> I tried with "./txtimestamp -6 -c 1 -C -N -L ::1" and it does not return any tx
> > > >>>>>> time stamp after this patch.
> > > >>>>>>
> > > >>>>>> I am likely missing something
> > > >>>>>> or v2 concluded that this behavior change is acceptable?
> > > >>>>>
> > > >>>>> Sorry, I submitted this series accidentally removing one important
> > > >>>>> thing which is similar to what Vadim Fedorenko mentioned in the v1
> > > >>>>> [1]:
> > > >>>>> adding another member like sk_flags_bpf to handle the cmsg case.
> > > >>>>>
> > > >>>>> Willem, would it be acceptable to add another field in struct sock to
> > > >>>>> help us recognise the case where BPF and cmsg works parallelly?
> > > >>>>>
> > > >>>>> [1]: https://lore.kernel.org/all/662873cb-a897-464e-bdb3-edf01363c3b2@xxxxxxxxx/
> > > >>>>
> > > >>>> The current timestamp flags don't need a u32. Maybe just reserve a bit
> > > >>>> for this purpose?
> > > >>>
> > > >>> Sure. Good suggestion.
> > > >>>
> > > >>> But I think only using one bit to reflect whether the sk->sk_tsflags
> > > >>> is used by normal or cmsg features is not enough. The existing
> > > >>> implementation in tcp_sendmsg_locked() doesn't override the
> > > >>> sk->sk_tsflags even the normal and cmsg features enabled parallelly.
> > > >>> It only overrides sockc.tsflags in tcp_sendmsg_locked(). Based on
> > > >>> that, even if at some point users suddenly remove the cmsg use and
> > > >>> then the prior normal SO_TIMESTAMPING continues to work.
> > > >>>
> > > >>> How about this, please see below:
> > > >>> For now, sk->sk_tsflags only uses 17 bits (see the last one
> > > >>> SOF_TIMESTAMPING_OPT_RX_FILTER). The cmsg feature only uses 4 flags
> > > >>> (see SOF_TIMESTAMPING_TX_RECORD_MASK in __sock_cmsg_send()). With that
> > > >>> said, we could reserve the highest four bits for cmsg use for the
> > > >>> moment. Four bits represents four points where we can record the
> > > >>> timestamp in the tx case.
> > > >>>
> > > >>> Do you agree on this point?
> > > >>
> > > >> I don't follow.
> > > >>
> > > >> I probably miss the entire point.
> > > >>
> > > >> The goal for sockcm fields is to start with the sk field and
> > > >> optionally override based on cmsg. This is what sockcm_init does for
> > > >> tsflags.
> > > >>
> > > >> This information is for the skb, so these are recording flags.
> > > >>
> > > >> Why does the new datapath need to know whether features are enabled
> > > >> through setsockopt or on a per-call basis with a cmsg?
> > > >>
> > > >> The goal was always to keep the reporting flags per socket, but make
> > > >> the recording flag per packet, mainly for sampling.
> > > >
> > > > If a user uses 1) cmsg feature, 2) bpf feature at the same time, we
> > > > allow each feature to work independently.
> > > >
> > > > How could it work? It relies on sk_tstamp_tx_flags() function in the
> > > > current patch: when we are in __skb_tstamp_tx(), we cannot know which
> > > > flags in each feature are set without fetching sk->sk_tsflags and
> > > > sk->sk_tsflags_bpf. Then we are able to know what timestamp we want to
> > > > record. To put it in a simple way, we're not sure if the user wants to
> > > > see a SCHED timestamp by using the cmsg feature in __skb_tstamp_tx()
> > > > if we hit this test statement "skb_shinfo(skb)->tx_flags &
> > > > SKBTX_SCHED_TSTAMP)". So we need those two socket tsflag fields to
> > > > help us.
> > >
> > > I also don't see how a new bit/integer in a sk can help to tell the per cmsg
> > > on/off. This cmsg may have tx timestamp on while the next cmsg can have it off.
> >
> > It's not hard to use it because we can clear every socket cmsg tsflags
> > when we're done the check in tcp_sendmsg_locked() if the cmsg feature
> > is not enabled. Then we can accurately know which timestamp should we
> > print in the tx path.
> >
> > >
> > > There is still one bit in skb_shinfo(skb)->tx_flags. How about define a
> > > SKBTX_BPF for everything. imo, the fine control on
> > > SOF_TIMESTAMPING_TX_{SCHED,SOFTWARE} is not useful for bpf. Almost all of the
> > > time the bpf program wants all available time stamps (sched, software, and
> > > hwtstamp if the NIC has it).
>
> I like the approach of just calling BPF on every hook. Assuming that
> the call is very cheap, which AFAIK is true.
>
> In that case we don't need complex branching in C to optionally skip
> this step, as we do for reporting to userspace.
>
> All the logic and complexity is in the BPF program itself.
>
> We obviously then let go of the goal to model the BPF API close to the
> existing SO_TIMESTAMPING API. Though I advocated for keeping them
> aligned, I also think we should just tailor it to what makes most
> sense in the BPF space.
>
> > Sorry, I really doubt that we can lose the fine control.
>
> Since BPF is called at each reporting point, no control is lost,
> actually.

Sorry, I still don't get it :( If there is something wrong with my
understanding, please correct me.

BPF is only called on every sock_opt point in this case, like
BPF_SOCK_OPS_TCP_CONNECT_CB, not every report point of
SO_TIMESTAMPING. If we add check to test if skb is set SKBTX_BPF in
__skb_tstamp_tx(), then at every point bpf will be called. But it's
different from SO_TIMESTAMPING drived by each bit (SCHED/TX_SOFTWARE)
to control each point. My question is if we would use SKBTX_BPF for
everything, how could we control and know when we hit
SCHED/TX_SOFTWARE/ACK time from the bpf programs' perspective? Only
one bit... It will print everything without the ability to control.

Then if we try the SKBTX_BPF approach, it seems we don't actually
insist on adding a test statement in __skb_tstamp_tx(). Instead, we
could add into more places (by only checking the SKBTX_BPF flag), say,
tcp_write_xmit(), right?

I'm not saying I'm opposed to this idea. Instead I think it's very
useful, just a few questions haunting me...

Thanks,
Jason