Re: [PATCH bpf-next v8 12/12] selftests/bpf: add simple bpf tests in the tx path for timestamping feature

Jason Xing <kerneljasonxing@xxxxxxxxx> · Thu, 6 Feb 2025 10:14:50 +0800

On Thu, Feb 6, 2025 at 9:28 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
>
> On 2/5/25 8:08 AM, Jason Xing wrote:
> >>> +     switch (skops->op) {
> >>> +     case BPF_SOCK_OPS_TS_SCHED_OPT_CB:
> >>> +             delay = val->sched_delay = timestamp - val->sendmsg_ns;
> >>> +             break;
> >>
> >> For a test this is fine. But just a reminder that in general a packet
> >> may pass through multiple qdiscs. For instance with bonding or tunnel
> >> virtual devices in the egress path.
> >
> > Right, I've seen this in production (two times qdisc timestamps
> > because of bonding).
> >
> >>
> >>> +     case BPF_SOCK_OPS_TS_SW_OPT_CB:
> >>> +             prior_ts = val->sched_delay + val->sendmsg_ns;
> >>> +             delay = val->sw_snd_delay = timestamp - prior_ts;
> >>> +             break;
> >>> +     case BPF_SOCK_OPS_TS_ACK_OPT_CB:
> >>> +             prior_ts = val->sw_snd_delay + val->sched_delay + val->sendmsg_ns;
> >>> +             delay = val->ack_delay = timestamp - prior_ts;
> >>> +             break;
> >>
> >> Similar to the above: fine for a test, but in practice be aware that
> >> packets may be resent, in which case an ACK might precede a repeat
> >> SCHED and SND. And erroneous or malicious peers may also just never
> >> send an ACK. So this can never be relied on in production settings,
> >> e.g., as the only signal to clear an entry from a map (as done in the
> >> branch below).
>
> All good points. I think all these notes should be added as comment to the test.

Got it, I will add them in the commit message.

> I think as a test, this will be a good start and can use some followup to
> address the cases.

Good idea.

>
> >
> > Agreed. In production, actually what we do is print all the timestamps
> > and let an agent parse them.
>
> The BPF program that runs in the kernel can provide its own user interface that
> best fits its environment. If a raw printing interface is sufficient, that works
> well and is simple on the BPF program side. If the production system cannot
> afford the raw printing cost, the bpf prog can perform some aggregation first.
>
> The BPF program should be able to detect when an outgoing skb is re-transmitted
> and act accordingly. There is BPF timer to retire entries for which no ACK has
> been received.

Oh, first time to know the BPF timer.

>
> Potentially, this data can be aggregated into the individual bpf_sk_storage or
> using a BPF map keyed by a particular IP address prefix.
>
> I just want to highlight here for people in the future referencing this thread
> to look for implementation ideas.

Thanks, I think they are useful! I will copy more description in the
commit message.

Thanks,
Jason