Re: [PATCH net-next v2 06/12] net-timestamp: introduce TS_SCHED_OPT_CB to generate dev xmit timestamp

Jason Xing <kerneljasonxing@xxxxxxxxx> · Wed, 16 Oct 2024 09:24:05 +0800

On Wed, Oct 16, 2024 at 9:01 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
>
> On 10/11/24 9:06 PM, Jason Xing wrote:
> > From: Jason Xing <kernelxing@xxxxxxxxxxx>
> >
> > Introduce BPF_SOCK_OPS_TS_SCHED_OPT_CB flag so that we can decide to
> > print timestamps when the skb just passes the dev layer.
> >
> > Signed-off-by: Jason Xing <kernelxing@xxxxxxxxxxx>
> > ---
> >   include/uapi/linux/bpf.h       |  5 +++++
> >   net/core/skbuff.c              | 17 +++++++++++++++--
> >   tools/include/uapi/linux/bpf.h |  5 +++++
> >   3 files changed, 25 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 157e139ed6fc..3cf3c9c896c7 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -7019,6 +7019,11 @@ enum {
> >                                        * by the kernel or the
> >                                        * earlier bpf-progs.
> >                                        */
> > +     BPF_SOCK_OPS_TS_SCHED_OPT_CB,   /* Called when skb is passing through
> > +                                      * dev layer when SO_TIMESTAMPING
> > +                                      * feature is on. It indicates the
> > +                                      * recorded timestamp.
> > +                                      */
> >   };
> >
> >   /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 3a4110d0f983..16e7bdc1eacb 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -5632,8 +5632,21 @@ static void bpf_skb_tstamp_tx_output(struct sock *sk, int tstype)
> >               return;
> >
> >       tp = tcp_sk(sk);
> > -     if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG))
> > -             return;
> > +     if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG)) {
> > +             struct timespec64 tstamp;
> > +             u32 cb_flag;
> > +
> > +             switch (tstype) {
> > +             case SCM_TSTAMP_SCHED:
> > +                     cb_flag = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
> > +                     break;
> > +             default:
> > +                     return;
> > +             }
> > +
> > +             tstamp = ktime_to_timespec64(ktime_get_real());
> > +             tcp_call_bpf_2arg(sk, cb_flag, tstamp.tv_sec, tstamp.tv_nsec);
>
> There is bpf_ktime_get_*() helper. The bpf prog can directly call the
> bpf_ktime_get_* helper and use whatever clock it sees fit instead of enforcing
> real clock here and doing an extra ktime_to_timespec64. Right now the
> bpf_ktime_get_*() does not have real clock which I think it can be added.

In this way, there is no need to add tcp_call_bpf_*arg() to pass
timestamp to userspace, right? Let the bpf program implement it.

Now I wonder what information I should pass? Sorry for the lack of BPF
related knowledge :(

>
> I think overall the tstamp reporting interface does not necessarily have to
> follow the socket API. The bpf prog is running in the kernel. It could pass
> other information to the bpf prog if it sees fit. e.g. the bpf prog could also
> get the original transmitted tcp skb if it is useful.

Good to know that! But how the BPF program parses the skb by using
tcp_call_bpf_2arg() which only passes u32 parameters.

Thanks,
Jason