Re: [PATCH bpf-next v8 10/12] bpf: make TCP tx timestamp bpf extension work

Martin KaFai Lau <martin.lau@xxxxxxxxx> · Wed, 5 Feb 2025 16:47:04 -0800

On 2/5/25 4:12 PM, Jason Xing wrote:
On Thu, Feb 6, 2025 at 5:57 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:

On 2/4/25 5:57 PM, Jakub Kicinski wrote:
On Wed,  5 Feb 2025 02:30:22 +0800 Jason Xing wrote:
+    if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
+        SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) {
+            struct skb_shared_info *shinfo = skb_shinfo(skb);
+            struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
+
+            tcb->txstamp_ack_bpf = 1;
+            shinfo->tx_flags |= SKBTX_BPF;
+            shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
+    }

If BPF program is attached we'll timestamp all skbs? Am I reading this
right?

If the attached bpf program explicitly turns on the SK_BPF_CB_TX_TIMESTAMPING
bit of a sock, then all skbs of this sock will be tx timestamp-ed.

Martin, I'm afraid it's not like what you expect. Only the last
portion of the sendmsg will enter the above function which means if
the size of sendmsg is large, only the last skb will be set SKBTX_BPF
and be timestamped.

Sure. The last skb of a large msg and more skb of small msg (or MSG_EOR).

My point is, only attaching a bpf alone is not enough. The 
SK_BPF_CB_TX_TIMESTAMPING still needs to be turned on.

Wouldn't it be better to let BPF_SOCK_OPS_TS_SND_CB return whether it's
interested in tracing current packet all the way thru the stack?

I like this idea. It can give the BPF prog a chance to do skb sampling on a
particular socket.

The return value of BPF_SOCK_OPS_TS_SND_CB (or any cgroup BPF prog return value)
already has another usage, which its return value is currently enforced by the
verifier. It is better not to convolute it further.

I don't prefer to add more use cases to skops->reply either, which is an union
of args[4], such that later progs (in the cgrp prog array) may lose the args value.

Jason, instead of always setting SKBTX_BPF and txstamp_ack_bpf in the kernel, a
new BPF kfunc can be added so that the BPF prog can call it to selectively set
SKBTX_BPF and txstamp_ack_bpf in some skb.

Agreed because at netdev 0x19 I have an explicit plan to share the
experience from our company about how to trace all the skbs which were
completed through a kernel module. It's how we use in production
especially for debug or diagnose use.

This is fine. The bpf prog can still do that by calling the kfunc. I don't see 
why move the bit setting into kfunc makes the whole set won't work.

I'm not knowledgeable enough about BPF, so I'd like to know if there
are some functions that I can take as good examples?

I think it's a standalone and good feature, can I handle it after this series?

Unfortunately, no. Once the default is on, this cannot be changed.

I think Jakub's suggestion to allow bpf prog selectively choose skb to timestamp 
is useful, so I suggested a way to do it.