Re: [PATCH bpf-next v11 11/12] bpf: support selective sampling for bpf timestamping

Jason Xing <kerneljasonxing@xxxxxxxxx> · Sun, 16 Feb 2025 05:11:11 +0800

On Sun, Feb 16, 2025 at 2:01 AM Willem de Bruijn
<willemdebruijn.kernel@xxxxxxxxx> wrote:
>
> Jason Xing wrote:
> > On Sat, Feb 15, 2025 at 11:10 PM Willem de Bruijn
> > <willemdebruijn.kernel@xxxxxxxxx> wrote:
> > >
> > > Jason Xing wrote:
> > > > Add the bpf_sock_ops_enable_tx_tstamp kfunc to allow BPF programs to
> > > > selectively enable TX timestamping on a skb during tcp_sendmsg().
> > > >
> > > > For example, BPF program will limit tracking X numbers of packets
> > > > and then will stop there instead of tracing all the sendmsgs of
> > > > matched flow all along. It would be helpful for users who cannot
> > > > afford to calculate latencies from every sendmsg call probably
> > > > due to the performance or storage space consideration.
> > > >
> > > > Signed-off-by: Jason Xing <kerneljasonxing@xxxxxxxxx>
> > > > ---
> > > >  kernel/bpf/btf.c  |  1 +
> > > >  net/core/filter.c | 33 ++++++++++++++++++++++++++++++++-
> > > >  2 files changed, 33 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > > index 9433b6467bbe..740210f883dc 100644
> > > > --- a/kernel/bpf/btf.c
> > > > +++ b/kernel/bpf/btf.c
> > > > @@ -8522,6 +8522,7 @@ static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type)
> > > >       case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
> > > >       case BPF_PROG_TYPE_CGROUP_SOCKOPT:
> > > >       case BPF_PROG_TYPE_CGROUP_SYSCTL:
> > > > +     case BPF_PROG_TYPE_SOCK_OPS:
> > > >               return BTF_KFUNC_HOOK_CGROUP;
> > > >       case BPF_PROG_TYPE_SCHED_ACT:
> > > >               return BTF_KFUNC_HOOK_SCHED_ACT;
> > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > index 7f56d0bbeb00..3b4c1e7b1470 100644
> > > > --- a/net/core/filter.c
> > > > +++ b/net/core/filter.c
> > > > @@ -12102,6 +12102,27 @@ __bpf_kfunc int bpf_sk_assign_tcp_reqsk(struct __sk_buff *s, struct sock *sk,
> > > >  #endif
> > > >  }
> > > >
> > > > +__bpf_kfunc int bpf_sock_ops_enable_tx_tstamp(struct bpf_sock_ops_kern *skops,
> > > > +                                           u64 flags)
> > > > +{
> > > > +     struct sk_buff *skb;
> > > > +     struct sock *sk;
> > > > +
> > > > +     if (skops->op != BPF_SOCK_OPS_TS_SND_CB)
> > > > +             return -EOPNOTSUPP;
> > > > +
> > > > +     if (flags)
> > > > +             return -EINVAL;
> > > > +
> > > > +     skb = skops->skb;
> > > > +     sk = skops->sk;
> > >
> > > nit: not used
> >
> > BPF programs can use this in the future if necessary whereas the
> > selftests don't reflect it.
>
> How does defining a local variable help there?

Sorry, I didn't state it clearly. I meant you're right, for now it is
useless, but for the future... Right, I will remove it.

>
> > >
> > > > +     skb_shinfo(skb)->tx_flags |= SKBTX_BPF;
> > > > +     TCP_SKB_CB(skb)->txstamp_ack |= TSTAMP_ACK_BPF;
> > > > +     skb_shinfo(skb)->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> > >
> > > Can this overwrite the seqno previously calculated by tcp_tx_timestamp?
> >
> > seqno? If you are referring to seqno, I don't think the BPF program is
> > allowed to modify it because SOCK_OPS_GET_OR_SET_FIELD() only supports
> > overwriting sk_txhash only. Please see sock_ops_convert_ctx_access().
>
> I meant tskey

It 'overwrites' the tskey here if the socket timestamping feature is
also on. But the seqno and len would not change during the gap between
tcp_tx_timestamp() and bpf_sock_ops_enable_tx_tstamp(), I think? If
the seq and len doesn't change, then the tskey will not be truly
overwritten with a different value. Unless you probably expect to see
this:

if (!skb_shinfo(skb)->tskey)
        skb_shinfo(skb)->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
?