Re: [PATCH net-next v3 02/14] net-timestamp: allow two features to work parallelly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/31/24 6:50 AM, Jason Xing wrote:
On Thu, Oct 31, 2024 at 8:30 PM Willem de Bruijn
<willemdebruijn.kernel@xxxxxxxxx> wrote:

Jason Xing wrote:
On Thu, Oct 31, 2024 at 2:27 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:

On 10/30/24 5:13 PM, Jason Xing wrote:
I realized that we will have some new sock_opt flags like
TS_SCHED_OPT_CB in patch 4, so we can control whether to print or
not... For each sock_opt point, they will be called without caring if
related flags in skb are set. Well, it's meaningless to add more
control of skb tsflags at each TS_xx_OPT_CB point.

Am I understanding in a correct way? Now, I'm totally fine with this great idea!
Yes, I think so.

The sockops prog can choose to ignore any BPF_SOCK_OPS_TS_*_CB. The are only 3:
SCHED, SND, and ACK. If the hwtstamp is available from a NIC, I think it would
be quite wasteful to throw it away. ACK can be controlled by the
TCP_SKB_CB(skb)->bpf_txstamp_ack.

Right, let me try this:)

Going back to my earlier bpf_setsockopt(SOL_SOCKET, BPF_TX_TIMESTAMPING)
comment. I think it may as well go back to use the "u8
bpf_sock_ops_cb_flags;" and use the bpf_sock_ops_cb_flags_set() helper to
enable/disable the timestamp related callback hook. May be add one
BPF_SOCK_OPS_TX_TIMESTAMPING_CB_FLAG.

bpf_sock_ops_cb_flags this flag is only used in TCP condition, right?
If that is so, it cannot be suitable for UDP.

I'm thinking of this solution:
1) adding a new flag in SOF_TIMESTAMPING_OPT_BPF flag (in
include/uapi/linux/net_tstamp.h) which can be used by sk->sk_tsflags

probably not in include/uapi/linux/net_tstamp.h. This flag can only be used by a bpf prog (meaning will not be used by user space syscall). More below.

2) flags =   SOF_TIMESTAMPING_OPT_BPF;    bpf_setsockopt(skops,
SOL_SOCKET, SO_TIMESTAMPING, &flags, sizeof(flags));
3) test if sk->sk_tsflags has this new flag in tcp_tx_timestamp() or
in udp_sendmsg()
...

Not sure how many churns/audits is needed to ensure the user space cannot set/clear the SOF_TIMESTAMPING_OPT_BPF bit in sk->sk_tsflags. Could be not much.

May be it is cleaner to leave the sk->sk_tsflags for user space only and having a separate field in "struct sock" to track bpf specific needs. More like your current sk_tsflags_bpf approach but I was thinking to reuse the bpf_sock_ops_cb_flags instead. e.g. "BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk), BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG)" is used to check if it needs to call a bpf prog to decide if it needs to add tcp header option. Here we want to test if it should call a bpf prog to make a decision on tx timestamp on a skb.

The bpf_sock_ops_cb_flags can be moved from struct tcp_sock to struct sock. It is doable from the bpf side.

All that said, but, yes, it will add some TCP specific enum flag (e.g. BPF_SOCK_OPS_RTO_CB_FLAG) to the struct sock which will not be used by UDP/raw/...etc, so may be keep your current sk_tsflags_bpf approach but rename it to sk_bpf_cb_flags in struct "sock" so that it can be reused for other non tstamp ops in the future? probably a u8 is enough.

This optname is used by the bpf prog only and not usable by user space syscall. If it prefers to stay with bpf_setsockopt (which is fine), it needs a bpf specific optname like the current TCP_BPF_SOCK_OPS_CB_FLAGS which currently sets the tp->bpf_sock_ops_cb_flags. May be a new SK_BPF_CB_FLAGS optname for setting the sk->sk_bpf_cb_flags, like bpf_setsockopt(skops_ctx, SOL_SOCKET, SK_BPF_CB_FLAGS, &val, sizeof(val)) and handle it in the sol_socket_sockopt() alone without calling into sk_{set,get}sockopt. Add a new enum for the optval for the sk_bpf_cb_flags:

enum {
	SK_BPF_CB_TX_TIMESTAMPING = (1 << 0),
	SK_BPF_CB_RX_TIEMSTAMPING = (1 << 1),
};




For tx, one new hook should be at the sendmsg and should be around
tcp_tx_timestamp (?) for tcp. Another hook is __skb_tstamp_tx() which should be

I think there are two points we're supposed to record:
1) the moment tcp/udp_sendmsg() is triggered. It represents the syscall time.
2) another point in tcp_tx_timestamp(). It represents the timestamp of
the last skb in this sendmsg() call.
Users may happen to send a big packet.

hmm... a big packet and sendmsg is blocked waiting for memory?


Err on the side of fewer measurement points. It's always possible to
add more later, but not possible to remove them (depending on whether
BPF infra is ABI).

I also think it is better to start with tcp_tx_timestamp() alone first to keep the patch set simple now. The selftest prog can use a bpf fentry prog to trace the tcp_sendmsg_locked(). This can be revisited later if the bpf fentry prog is not enough.


Overall great suggestion. Thanks a lot for sharing your BPF expertise
on this, Martin.

Thanks!


On using the raw seqno: this data is accessible to anyone root in
namespace (ns_capable) using packet sockets, so as long as it does not
open to more than that, it is logically equivalent to the current
setting.

With seqno the BPF program has to be careful that the same seqno can
be retransmitted, so for instance seeing an ACK before a (second) SND
must be anticipated. That is true for SO_TIMESTAMPING today too.

Ah. It will be a very useful comment to add to the selftests bpf prog.


For datagrams (UDP as well as RAW and many non IP protocols), an
alternative still needs to be found.

In udp/raw/..., I don't know how likely is the user space having "cork->tx_flags & SKBTX_ANY_TSTAMP" set but has neither "READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID" nor "cork->flags & IPCORK_TS_OPT_ID" set. If it is unlikely, may be we can just disallow bpf prog from directly setting skb_shinfo(skb)->tskey for this particular skb.

For all other cases, in __ip[6]_append_data, directly call a bpf prog and also pass the kernel decided tskey to the bpf prog.

The kernel passed tskey could be 0 (meaning the user space has not used it). The bpf prog can give one for the kernel to use. The bpf prog can store the sk_tskey_bpf in the bpf_sk_storage now. Meaning no need to add one to the struct sock. The bpf prog does not have to start from 0 (e.g. start from U32_MAX instead) if it helps.

If the kernel passed tskey is not 0, the bpf prog can just use that one (assuming the user space is doing something sane, like the value in SCM_TS_OPT_ID won't be jumping back and front between 0 to U32_MAX). I hope this is very unlikely also (?) but the bpf prog can probably detect this and choose to ignore this sk.

To solve the above unsupported corner cases, I think we can allow the bpf prog to store something in the shinfo->hwtstamps at the tx path. The bpf-only key could be one of the things to store there. Change __ip[6]_append_data to handle the shinfo->hwtstamps. I think allowing the bpf prog to write to the shinfo->hwtsatmps could be considered later when needed.

[ I may be off tomorrow, so reply could be slower. ]


It seems that using the tskey for bpf extension is always correct and
easy to use.

Could we provide the tskey to users and then let users decide the
better way to identify the call of sendmsg. We could keep the
traditional use of tskey. If without it, people need to figure out a
good way and may find it difficult to use the bpf extension.

I will keep thinking of alternatives for UDP in the meantime.




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux