Re: [PATCH bpf-next v8 12/12] selftests/bpf: add simple bpf tests in the tx path for timestamping feature

Jason Xing <kerneljasonxing@xxxxxxxxx> · Thu, 6 Feb 2025 00:08:41 +0800

On Wed, Feb 5, 2025 at 11:54 PM Willem de Bruijn
<willemdebruijn.kernel@xxxxxxxxx> wrote:
>
> Jason Xing wrote:
> > Bpf prog calculates a couple of latency delta between each tx points
> > which SO_TIMESTAMPING feature has already implemented. It can be used
> > in the real world to diagnose the behaviour in the tx path.
> >
> > Also, check the safety issues by accessing a few bpf calls in
> > bpf_test_access_bpf_calls().
> >
> > Signed-off-by: Jason Xing <kerneljasonxing@xxxxxxxxx>
>
> > +static bool bpf_test_delay(struct bpf_sock_ops *skops, const struct sock *sk)
> > +{
> > +     struct bpf_sock_ops_kern *skops_kern;
> > +     u64 timestamp = bpf_ktime_get_ns();
> > +     struct skb_shared_info *shinfo;
> > +     struct delay_info dinfo = {0};
> > +     struct sk_tskey key = {0};
> > +     struct delay_info *val;
> > +     struct sk_buff *skb;
> > +     struct sk_stg *stg;
> > +     u64 prior_ts, delay;
> > +
> > +     if (bpf_test_access_bpf_calls(skops, sk))
> > +             return false;
> > +
> > +     skops_kern = bpf_cast_to_kern_ctx(skops);
> > +     skb = skops_kern->skb;
> > +     shinfo = bpf_core_cast(skb->head + skb->end, struct skb_shared_info);
> > +     key.tskey = shinfo->tskey;
> > +     if (!key.tskey)
> > +             return false;
> > +
> > +     key.cookie = bpf_get_socket_cookie(skops);
> > +     if (!key.cookie)
> > +             return false;
> > +
> > +     if (skops->op == BPF_SOCK_OPS_TS_SND_CB) {
> > +             stg = bpf_sk_storage_get(&sk_stg_map, (void *)sk, 0, 0);
> > +             if (!stg)
> > +                     return false;
> > +             dinfo.sendmsg_ns = stg->sendmsg_ns;
> > +             bpf_map_update_elem(&time_map, &key, &dinfo, BPF_ANY);
> > +             goto out;
> > +     }
> > +
> > +     val = bpf_map_lookup_elem(&time_map, &key);
> > +     if (!val)
> > +             return false;
> > +
> > +     switch (skops->op) {
> > +     case BPF_SOCK_OPS_TS_SCHED_OPT_CB:
> > +             delay = val->sched_delay = timestamp - val->sendmsg_ns;
> > +             break;
>
> For a test this is fine. But just a reminder that in general a packet
> may pass through multiple qdiscs. For instance with bonding or tunnel
> virtual devices in the egress path.

Right, I've seen this in production (two times qdisc timestamps
because of bonding).

>
> > +     case BPF_SOCK_OPS_TS_SW_OPT_CB:
> > +             prior_ts = val->sched_delay + val->sendmsg_ns;
> > +             delay = val->sw_snd_delay = timestamp - prior_ts;
> > +             break;
> > +     case BPF_SOCK_OPS_TS_ACK_OPT_CB:
> > +             prior_ts = val->sw_snd_delay + val->sched_delay + val->sendmsg_ns;
> > +             delay = val->ack_delay = timestamp - prior_ts;
> > +             break;
>
> Similar to the above: fine for a test, but in practice be aware that
> packets may be resent, in which case an ACK might precede a repeat
> SCHED and SND. And erroneous or malicious peers may also just never
> send an ACK. So this can never be relied on in production settings,
> e.g., as the only signal to clear an entry from a map (as done in the
> branch below).

Agreed. In production, actually what we do is print all the timestamps
and let an agent parse them.

>
> > +     }
> > +
> > +     if (delay >= delay_tolerance_nsec)
> > +             return false;
> > +
> > +     /* Since it's the last one, remove from the map after latency check */
> > +     if (skops->op == BPF_SOCK_OPS_TS_ACK_OPT_CB)
> > +             bpf_map_delete_elem(&time_map, &key);
> > +
> > +out:
> > +     return true;
> > +}
> > +