Re: [PATCH net-next v4 02/11] net-timestamp: prepare for bpf prog use

Jason Xing <kerneljasonxing@xxxxxxxxx> · Fri, 13 Dec 2024 22:42:28 +0800

On Fri, Dec 13, 2024 at 9:41 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
>
> On 12/11/24 1:17 AM, Jason Xing wrote:
> > On Wed, Dec 11, 2024 at 10:02 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
> >>
> >> On 12/7/24 9:37 AM, Jason Xing wrote:
> >>> From: Jason Xing <kernelxing@xxxxxxxxxxx>
> >>>
> >>> Later, I would introduce three points to report some information
> >>> to user space based on this.
> >>>
> >>> Signed-off-by: Jason Xing <kernelxing@xxxxxxxxxxx>
> >>> ---
> >>>    include/net/sock.h |  7 +++++++
> >>>    net/core/sock.c    | 15 +++++++++++++++
> >>>    2 files changed, 22 insertions(+)
> >>>
> >>> diff --git a/include/net/sock.h b/include/net/sock.h
> >>> index 0dd464ba9e46..f88a00108a2f 100644
> >>> --- a/include/net/sock.h
> >>> +++ b/include/net/sock.h
> >>> @@ -2920,6 +2920,13 @@ int sock_set_timestamping(struct sock *sk, int optname,
> >>>                          struct so_timestamping timestamping);
> >>>
> >>>    void sock_enable_timestamps(struct sock *sk);
> >>> +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
> >>> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op);
> >>> +#else
> >>> +static inline void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> >>> +{
> >>> +}
> >>> +#endif
> >>>    void sock_no_linger(struct sock *sk);
> >>>    void sock_set_keepalive(struct sock *sk);
> >>>    void sock_set_priority(struct sock *sk, u32 priority);
> >>> diff --git a/net/core/sock.c b/net/core/sock.c
> >>> index 74729d20cd00..79cb5c74c76c 100644
> >>> --- a/net/core/sock.c
> >>> +++ b/net/core/sock.c
> >>> @@ -941,6 +941,21 @@ int sock_set_timestamping(struct sock *sk, int optname,
> >>>        return 0;
> >>>    }
> >>>
> >>> +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_SYSCALL)
> >>> +void bpf_skops_tx_timestamping(struct sock *sk, struct sk_buff *skb, int op)
> >>> +{
> >>> +     struct bpf_sock_ops_kern sock_ops;
> >>> +
> >>> +     sock_owned_by_me(sk);
> >>
> >> I don't think this can be assumed in the time stamping callback.
> >
> > I'll remove this.
> >
> >>
> >> To remove this assumption for sockops, I believe it needs to stop the bpf prog
> >> from calling a few bpf helpers. In particular, the bpf_sock_ops_cb_flags_set and
> >> bpf_sock_ops_setsockopt. This should be easy by asking the helpers to check the
> >> "u8 op" in "struct bpf_sock_ops_kern *".
> >
> > Sorry, I don't follow. Could you rephrase your thoughts? Thanks.
>
> Take a look at bpf_sock_ops_setsockopt in filter.c. To change a sk, it needs to
> hold the sk_lock. If you drill down bpf_sock_ops_setsockopt,
> sock_owned_by_me(sk) is checked somewhere.

Thanks, now I totally follow.

>
> The sk_lock held assumption is true so far for the existing sockops callbacks.
> The new timestamping sockops callback does not necessary have the sk_lock held,

Well, for TCP only, there are three new callbacks that I think own the
sk_lock already, say, the tcp_sendmsg() will call the lock_sock(). For
other types, like you said, maybe not.

> so it will break the bpf_sock_ops_setsockopt() assumption on the sk_lock.

Agreed, at least for the writer accessing the sk is not allowed actually.

>
> >
> >>
> >> I just noticed a trickier one, sockops bpf prog can write to sk->sk_txhash. The
> >> same should go for reading from sk. Also, sockops prog assumes a fullsock sk is
> >> a tcp_sock which also won't work for the udp case. A quick thought is to do
> >> something similar to is_fullsock. May be repurpose the is_fullsock somehow or a
> >> new u8 is needed. Take a look at SOCK_OPS_{GET,SET}_FIELD. It avoids
> >> writing/reading the sk when is_fullsock is false.
> >
> > Do you mean that if we introduce a new field, then bpf prog can
> > read/write the socket?
>
> The same goes for writing the sk, e.g. writing the sk->sk_txhash. It needs the
> sk_lock held. Reading may be ok-ish. The bpf prog can read it anyway by
> bpf_probe_read...etc.
>
> When adding udp timestamp callback later, it needs to stop reading the tcp_sock
> through skops from the udp callback for sure. Do take a look at
> SOCK_OPS_GET_TCP_SOCK_FIELD. I think we need to ensure the udp timestamp
> callback won't break here before moving forward.

Agreed. Removing the "sock_ops.sk = sk;" is simple, but I still want
the bpf prog to be able to read some fields from the socket under
those new callbacks.

Let me figure out a feasible solution this weekend if everything goes well.

>
> >
> > Reading the socket could be very helpful in the long run.
> >
> >>
> >> This is a signal that the existing sockops interface has already seen better
> >> days. I hope not too many fixes like these are needed to get tcp/udp
> >> timestamping to work.
> >>
> >>> +
> >>> +     memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
> >>> +     sock_ops.op = op;
> >>> +     sock_ops.is_fullsock = 1;
> >>
> >> I don't think we can assume it is always is_fullsock either.
> >
> > Right, but for now, TCP seems to need this. I can remove this also.
>
> I take this back. After reading the existing __skb_tstamp_tx, I think sk is
> always fullsock here.

Got it.

>
> >
> >>
> >>> +     sock_ops.sk = sk;
> >>> +     __cgroup_bpf_run_filter_sock_ops(sk, &sock_ops, CGROUP_SOCK_OPS);
> >>
> >> Same here. sk may not be fullsock. BPF_CGROUP_RUN_PROG_SOCK_OPS(&sock_ops) is
> >> needed.
> >
> > If we use this helper, we will change when the udp bpf extension needs
> > to be supported.
> >
> >>
> >> [ I will continue the rest of the set later. ]
> >
> > Thanks a lot :)
> >
> >>
> >>> +}
> >>> +#endif
> >>> +
> >>>    void sock_set_keepalive(struct sock *sk)
> >>>    {
> >>>        lock_sock(sk);
> >>
>