Re: [PATCH net-next v3 02/14] net-timestamp: allow two features to work parallelly

Jason Xing <kerneljasonxing@xxxxxxxxx> · Thu, 31 Oct 2024 08:13:42 +0800

On Thu, Oct 31, 2024 at 7:54 AM Jason Xing <kerneljasonxing@xxxxxxxxx> wrote:
>
> On Thu, Oct 31, 2024 at 1:15 AM Willem de Bruijn
> <willemdebruijn.kernel@xxxxxxxxx> wrote:
> >
> > Jason Xing wrote:
> > > On Wed, Oct 30, 2024 at 1:37 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
> > > >
> > > > On 10/29/24 8:04 PM, Jason Xing wrote:
> > > > >>>>>>>    static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
> > > > >>>>>>>                                 const struct sk_buff *ack_skb,
> > > > >>>>>>>                                 struct skb_shared_hwtstamps *hwtstamps,
> > > > >>>>>>> @@ -5549,6 +5575,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
> > > > >>>>>>>        u32 tsflags;
> > > > >>>>>>>
> > > > >>>>>>>        tsflags = READ_ONCE(sk->sk_tsflags);
> > > > >>>>>>> +     if (!sk_tstamp_tx_flags(sk, tsflags, tstype))
> > > > >>>>>>
> > > > >>>>>> I still don't get this part since v2. How does it work with cmsg only
> > > > >>>>>> SOF_TIMESTAMPING_TX_*?
> > > > >>>>>>
> > > > >>>>>> I tried with "./txtimestamp -6 -c 1 -C -N -L ::1" and it does not return any tx
> > > > >>>>>> time stamp after this patch.
> > > > >>>>>>
> > > > >>>>>> I am likely missing something
> > > > >>>>>> or v2 concluded that this behavior change is acceptable?
> > > > >>>>>
> > > > >>>>> Sorry, I submitted this series accidentally removing one important
> > > > >>>>> thing which is similar to what Vadim Fedorenko mentioned in the v1
> > > > >>>>> [1]:
> > > > >>>>> adding another member like sk_flags_bpf to handle the cmsg case.
> > > > >>>>>
> > > > >>>>> Willem, would it be acceptable to add another field in struct sock to
> > > > >>>>> help us recognise the case where BPF and cmsg works parallelly?
> > > > >>>>>
> > > > >>>>> [1]: https://lore.kernel.org/all/662873cb-a897-464e-bdb3-edf01363c3b2@xxxxxxxxx/
> > > > >>>>
> > > > >>>> The current timestamp flags don't need a u32. Maybe just reserve a bit
> > > > >>>> for this purpose?
> > > > >>>
> > > > >>> Sure. Good suggestion.
> > > > >>>
> > > > >>> But I think only using one bit to reflect whether the sk->sk_tsflags
> > > > >>> is used by normal or cmsg features is not enough. The existing
> > > > >>> implementation in tcp_sendmsg_locked() doesn't override the
> > > > >>> sk->sk_tsflags even the normal and cmsg features enabled parallelly.
> > > > >>> It only overrides sockc.tsflags in tcp_sendmsg_locked(). Based on
> > > > >>> that, even if at some point users suddenly remove the cmsg use and
> > > > >>> then the prior normal SO_TIMESTAMPING continues to work.
> > > > >>>
> > > > >>> How about this, please see below:
> > > > >>> For now, sk->sk_tsflags only uses 17 bits (see the last one
> > > > >>> SOF_TIMESTAMPING_OPT_RX_FILTER). The cmsg feature only uses 4 flags
> > > > >>> (see SOF_TIMESTAMPING_TX_RECORD_MASK in __sock_cmsg_send()). With that
> > > > >>> said, we could reserve the highest four bits for cmsg use for the
> > > > >>> moment. Four bits represents four points where we can record the
> > > > >>> timestamp in the tx case.
> > > > >>>
> > > > >>> Do you agree on this point?
> > > > >>
> > > > >> I don't follow.
> > > > >>
> > > > >> I probably miss the entire point.
> > > > >>
> > > > >> The goal for sockcm fields is to start with the sk field and
> > > > >> optionally override based on cmsg. This is what sockcm_init does for
> > > > >> tsflags.
> > > > >>
> > > > >> This information is for the skb, so these are recording flags.
> > > > >>
> > > > >> Why does the new datapath need to know whether features are enabled
> > > > >> through setsockopt or on a per-call basis with a cmsg?
> > > > >>
> > > > >> The goal was always to keep the reporting flags per socket, but make
> > > > >> the recording flag per packet, mainly for sampling.
> > > > >
> > > > > If a user uses 1) cmsg feature, 2) bpf feature at the same time, we
> > > > > allow each feature to work independently.
> > > > >
> > > > > How could it work? It relies on sk_tstamp_tx_flags() function in the
> > > > > current patch: when we are in __skb_tstamp_tx(), we cannot know which
> > > > > flags in each feature are set without fetching sk->sk_tsflags and
> > > > > sk->sk_tsflags_bpf. Then we are able to know what timestamp we want to
> > > > > record. To put it in a simple way, we're not sure if the user wants to
> > > > > see a SCHED timestamp by using the cmsg feature in __skb_tstamp_tx()
> > > > > if we hit this test statement "skb_shinfo(skb)->tx_flags &
> > > > > SKBTX_SCHED_TSTAMP)". So we need those two socket tsflag fields to
> > > > > help us.
> > > >
> > > > I also don't see how a new bit/integer in a sk can help to tell the per cmsg
> > > > on/off. This cmsg may have tx timestamp on while the next cmsg can have it off.
> > >
> > > It's not hard to use it because we can clear every socket cmsg tsflags
> > > when we're done the check in tcp_sendmsg_locked() if the cmsg feature
> > > is not enabled. Then we can accurately know which timestamp should we
> > > print in the tx path.
> > >
> > > >
> > > > There is still one bit in skb_shinfo(skb)->tx_flags. How about define a
> > > > SKBTX_BPF for everything. imo, the fine control on
> > > > SOF_TIMESTAMPING_TX_{SCHED,SOFTWARE} is not useful for bpf. Almost all of the
> > > > time the bpf program wants all available time stamps (sched, software, and
> > > > hwtstamp if the NIC has it).
> >
> > I like the approach of just calling BPF on every hook. Assuming that
> > the call is very cheap, which AFAIK is true.
> >
> > In that case we don't need complex branching in C to optionally skip
> > this step, as we do for reporting to userspace.
> >
> > All the logic and complexity is in the BPF program itself.
> >
> > We obviously then let go of the goal to model the BPF API close to the
> > existing SO_TIMESTAMPING API. Though I advocated for keeping them
> > aligned, I also think we should just tailor it to what makes most
> > sense in the BPF space.
> >
> > > Sorry, I really doubt that we can lose the fine control.
> >
> > Since BPF is called at each reporting point, no control is lost,
> > actually.
>
> Sorry, I still don't get it :( If there is something wrong with my
> understanding, please correct me.
>
> BPF is only called on every sock_opt point in this case, like
> BPF_SOCK_OPS_TCP_CONNECT_CB, not every report point of
> SO_TIMESTAMPING. If we add check to test if skb is set SKBTX_BPF in
> __skb_tstamp_tx(), then at every point bpf will be called. But it's
> different from SO_TIMESTAMPING drived by each bit (SCHED/TX_SOFTWARE)
> to control each point. My question is if we would use SKBTX_BPF for
> everything, how could we control and know when we hit
> SCHED/TX_SOFTWARE/ACK time from the bpf programs' perspective? Only
> one bit... It will print everything without the ability to control.
>
> Then if we try the SKBTX_BPF approach, it seems we don't actually
> insist on adding a test statement in __skb_tstamp_tx(). Instead, we
> could add into more places (by only checking the SKBTX_BPF flag), say,
> tcp_write_xmit(), right?

I realized that we will have some new sock_opt flags like
TS_SCHED_OPT_CB in patch 4, so we can control whether to print or
not... For each sock_opt point, they will be called without caring if
related flags in skb are set. Well, it's meaningless to add more
control of skb tsflags at each TS_xx_OPT_CB point.

Am I understanding in a correct way? Now, I'm totally fine with this great idea!

Thanks,
Jason