On Fri, Dec 13, 2024 at 9:41 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > On 12/11/24 1:17 AM, Jason Xing wrote: > > On Wed, Dec 11, 2024 at 10:02 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > >> > >> On 12/7/24 9:37 AM, Jason Xing wrote: > >>> From: Jason Xing <kernelxing@xxxxxxxxxxx> > >>> > >>> Later, I would introduce three points to report some information > >>> to user space based on this. > >>> > >>> Signed-off-by: Jason Xing <kernelxing@xxxxxxxxxxx> > >>> --- > >>> include/net/sock.h | 7 +++++++ > >>> net/core/sock.c | 15 +++++++++++++++ > >>> 2 files changed, 22 insertions(+) > >>> > >>> diff --git a/include/net/sock.h b/include/net/sock.h > >>> index 0dd464ba9e46..f88a00108a2f 100644 > >>> --- a/include/net/sock.h > >>> +++ b/include/net/sock.h > >>> @@ -2920,6 +2920,13 @@ int sock_set_timestamping(struct sock *sk, int optname, > >>> struct so_timestamping timestamping); > >>> > >>> void sock_enable_timestamps(struct sock *sk); > >>> +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL) > >>> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op); > >>> +#else > >>> +static inline void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op) > >>> +{ > >>> +} > >>> +#endif > >>> void sock_no_linger(struct sock *sk); > >>> void sock_set_keepalive(struct sock *sk); > >>> void sock_set_priority(struct sock *sk, u32 priority); > >>> diff --git a/net/core/sock.c b/net/core/sock.c > >>> index 74729d20cd00..79cb5c74c76c 100644 > >>> --- a/net/core/sock.c > >>> +++ b/net/core/sock.c > >>> @@ -941,6 +941,21 @@ int sock_set_timestamping(struct sock *sk, int optname, > >>> return 0; > >>> } > >>> > >>> +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL) > >>> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op) > >>> +{ > >>> + struct bpf_sock_ops_kern sock_ops; > >>> + > >>> + sock_owned_by_me(sk); > >> > >> I don't think this can be assumed in the time stamping callback. > > > > I'll remove this. > > > >> > >> To remove this assumption for sockops, I believe it needs to stop the bpf prog > >> from calling a few bpf helpers. In particular, the bpf_sock_ops_cb_flags_set and > >> bpf_sock_ops_setsockopt. This should be easy by asking the helpers to check the > >> "u8 op" in "struct bpf_sock_ops_kern *". > > > > Sorry, I don't follow. Could you rephrase your thoughts? Thanks. > > Take a look at bpf_sock_ops_setsockopt in filter.c. To change a sk, it needs to > hold the sk_lock. If you drill down bpf_sock_ops_setsockopt, > sock_owned_by_me(sk) is checked somewhere. Thanks, now I totally follow. > > The sk_lock held assumption is true so far for the existing sockops callbacks. > The new timestamping sockops callback does not necessary have the sk_lock held, Well, for TCP only, there are three new callbacks that I think own the sk_lock already, say, the tcp_sendmsg() will call the lock_sock(). For other types, like you said, maybe not. > so it will break the bpf_sock_ops_setsockopt() assumption on the sk_lock. Agreed, at least for the writer accessing the sk is not allowed actually. > > > > >> > >> I just noticed a trickier one, sockops bpf prog can write to sk->sk_txhash. The > >> same should go for reading from sk. Also, sockops prog assumes a fullsock sk is > >> a tcp_sock which also won't work for the udp case. A quick thought is to do > >> something similar to is_fullsock. May be repurpose the is_fullsock somehow or a > >> new u8 is needed. Take a look at SOCK_OPS_{GET,SET}_FIELD. It avoids > >> writing/reading the sk when is_fullsock is false. > > > > Do you mean that if we introduce a new field, then bpf prog can > > read/write the socket? > > The same goes for writing the sk, e.g. writing the sk->sk_txhash. It needs the > sk_lock held. Reading may be ok-ish. The bpf prog can read it anyway by > bpf_probe_read...etc. > > When adding udp timestamp callback later, it needs to stop reading the tcp_sock > through skops from the udp callback for sure. Do take a look at > SOCK_OPS_GET_TCP_SOCK_FIELD. I think we need to ensure the udp timestamp > callback won't break here before moving forward. Agreed. Removing the "sock_ops.sk = sk;" is simple, but I still want the bpf prog to be able to read some fields from the socket under those new callbacks. Let me figure out a feasible solution this weekend if everything goes well. > > > > > Reading the socket could be very helpful in the long run. > > > >> > >> This is a signal that the existing sockops interface has already seen better > >> days. I hope not too many fixes like these are needed to get tcp/udp > >> timestamping to work. > >> > >>> + > >>> + memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp)); > >>> + sock_ops.op = op; > >>> + sock_ops.is_fullsock = 1; > >> > >> I don't think we can assume it is always is_fullsock either. > > > > Right, but for now, TCP seems to need this. I can remove this also. > > I take this back. After reading the existing __skb_tstamp_tx, I think sk is > always fullsock here. Got it. > > > > >> > >>> + sock_ops.sk = sk; > >>> + __cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS); > >> > >> Same here. sk may not be fullsock. BPF_CGROUP_RUN_PROG_SOCK_OPS(&sock_ops) is > >> needed. > > > > If we use this helper, we will change when the udp bpf extension needs > > to be supported. > > > >> > >> [ I will continue the rest of the set later. ] > > > > Thanks a lot :) > > > >> > >>> +} > >>> +#endif > >>> + > >>> void sock_set_keepalive(struct sock *sk) > >>> { > >>> lock_sock(sk); > >> >