On Fri, Nov 1, 2024 at 9:32 PM Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> wrote: > > Martin KaFai Lau wrote: > > On 10/31/24 6:50 AM, Jason Xing wrote: > > > On Thu, Oct 31, 2024 at 8:30 PM Willem de Bruijn > > > <willemdebruijn.kernel@xxxxxxxxx> wrote: > > >> > > >> Jason Xing wrote: > > >>> On Thu, Oct 31, 2024 at 2:27 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > >>>> > > >>>> On 10/30/24 5:13 PM, Jason Xing wrote: > > >>>>> I realized that we will have some new sock_opt flags like > > >>>>> TS_SCHED_OPT_CB in patch 4, so we can control whether to print or > > >>>>> not... For each sock_opt point, they will be called without caring if > > >>>>> related flags in skb are set. Well, it's meaningless to add more > > >>>>> control of skb tsflags at each TS_xx_OPT_CB point. > > >>>>> > > >>>>> Am I understanding in a correct way? Now, I'm totally fine with this great idea! > > >>>> Yes, I think so. > > >>>> > > >>>> The sockops prog can choose to ignore any BPF_SOCK_OPS_TS_*_CB. The are only 3: > > >>>> SCHED, SND, and ACK. If the hwtstamp is available from a NIC, I think it would > > >>>> be quite wasteful to throw it away. ACK can be controlled by the > > >>>> TCP_SKB_CB(skb)->bpf_txstamp_ack. > > >>> > > >>> Right, let me try this:) > > >>> > > >>>> Going back to my earlier bpf_setsockopt(SOL_SOCKET, BPF_TX_TIMESTAMPING) > > >>>> comment. I think it may as well go back to use the "u8 > > >>>> bpf_sock_ops_cb_flags;" and use the bpf_sock_ops_cb_flags_set() helper to > > >>>> enable/disable the timestamp related callback hook. May be add one > > >>>> BPF_SOCK_OPS_TX_TIMESTAMPING_CB_FLAG. > > >>> > > >>> bpf_sock_ops_cb_flags this flag is only used in TCP condition, right? > > >>> If that is so, it cannot be suitable for UDP. > > >>> > > >>> I'm thinking of this solution: > > >>> 1) adding a new flag in SOF_TIMESTAMPING_OPT_BPF flag (in > > >>> include/uapi/linux/net_tstamp.h) which can be used by sk->sk_tsflags > > > > probably not in include/uapi/linux/net_tstamp.h. This flag can only be used by a > > bpf prog (meaning will not be used by user space syscall). More below. > > > > >>> 2) flags = SOF_TIMESTAMPING_OPT_BPF; bpf_setsockopt(skops, > > >>> SOL_SOCKET, SO_TIMESTAMPING, &flags, sizeof(flags)); > > >>> 3) test if sk->sk_tsflags has this new flag in tcp_tx_timestamp() or > > >>> in udp_sendmsg() > > >>> ... > > > > Not sure how many churns/audits is needed to ensure the user space cannot > > set/clear the SOF_TIMESTAMPING_OPT_BPF bit in sk->sk_tsflags. Could be not much. > > Stores are limited to defined bits with the following in > sock_set_timestamping > > if (val & ~SOF_TIMESTAMPING_MASK) > return -EINVAL; > > > May be it is cleaner to leave the sk->sk_tsflags for user space only and having > > a separate field in "struct sock" to track bpf specific needs. More like your > > current sk_tsflags_bpf approach but I was thinking to reuse the > > bpf_sock_ops_cb_flags instead. e.g. "BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk), > > BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG)" is used to check if it needs to call a bpf > > prog to decide if it needs to add tcp header option. Here we want to test if it > > should call a bpf prog to make a decision on tx timestamp on a skb. > > > > The bpf_sock_ops_cb_flags can be moved from struct tcp_sock to struct sock. It > > is doable from the bpf side. > > > > All that said, but, yes, it will add some TCP specific enum flag (e.g. > > BPF_SOCK_OPS_RTO_CB_FLAG) to the struct sock which will not be used by > > UDP/raw/...etc, so may be keep your current sk_tsflags_bpf approach but rename > > it to sk_bpf_cb_flags in struct "sock" so that it can be reused for other non > > tstamp ops in the future? probably a u8 is enough. > > > > This optname is used by the bpf prog only and not usable by user space syscall. > > If it prefers to stay with bpf_setsockopt (which is fine), it needs a bpf > > specific optname like the current TCP_BPF_SOCK_OPS_CB_FLAGS which currently sets > > the tp->bpf_sock_ops_cb_flags. May be a new SK_BPF_CB_FLAGS optname for setting > > the sk->sk_bpf_cb_flags, like bpf_setsockopt(skops_ctx, SOL_SOCKET, > > SK_BPF_CB_FLAGS, &val, sizeof(val)) and handle it in the sol_socket_sockopt() > > alone without calling into sk_{set,get}sockopt. Add a new enum for the optval > > for the sk_bpf_cb_flags: > > > > enum { > > SK_BPF_CB_TX_TIMESTAMPING = (1 << 0), > > SK_BPF_CB_RX_TIEMSTAMPING = (1 << 1), > > }; > > > > >> > > >> On using the raw seqno: this data is accessible to anyone root in > > >> namespace (ns_capable) using packet sockets, so as long as it does not > > >> open to more than that, it is logically equivalent to the current > > >> setting. > > >> > > >> With seqno the BPF program has to be careful that the same seqno can > > >> be retransmitted, so for instance seeing an ACK before a (second) SND > > >> must be anticipated. That is true for SO_TIMESTAMPING today too. > > > > Ah. It will be a very useful comment to add to the selftests bpf prog. > > > > >> > > >> For datagrams (UDP as well as RAW and many non IP protocols), an > > >> alternative still needs to be found. > > > > In udp/raw/..., I don't know how likely is the user space having "cork->tx_flags > > & SKBTX_ANY_TSTAMP" set but has neither "READ_ONCE(sk->sk_tsflags) & > > SOF_TIMESTAMPING_OPT_ID" nor "cork->flags & IPCORK_TS_OPT_ID" set. > > This is not something to rely on. OPT_ID was added relatively recently. > Older applications, or any that just use the most straightforward API, > will not set this. > > > If it is > > unlikely, may be we can just disallow bpf prog from directly setting > > skb_shinfo(skb)->tskey for this particular skb. > > > > For all other cases, in __ip[6]_append_data, directly call a bpf prog and also > > pass the kernel decided tskey to the bpf prog. > > > > The kernel passed tskey could be 0 (meaning the user space has not used it). The > > bpf prog can give one for the kernel to use. The bpf prog can store the > > sk_tskey_bpf in the bpf_sk_storage now. Meaning no need to add one to the struct > > sock. The bpf prog does not have to start from 0 (e.g. start from U32_MAX > > instead) if it helps. > > > > If the kernel passed tskey is not 0, the bpf prog can just use that one > > (assuming the user space is doing something sane, like the value in > > SCM_TS_OPT_ID won't be jumping back and front between 0 to U32_MAX). I hope this > > is very unlikely also (?) but the bpf prog can probably detect this and choose > > to ignore this sk. > > If an applications uses OPT_ID, it is unlikely that they will toggle > the feature on and off on a per-packet basis. So in the common case > the program could use the user-set counter or use its own if userspace > does not enable the feature. In the rare case that an application does > intermittently set an OPT_ID, the numbering would be erratic. This > does mean that an actively malicious application could mess with admin > measurements. > Sorry, I got lost in this part. What would you recommend I should do about OPT_ID in the next move? Should I keep those three OPT_ID patches? Thanks, Jason