Re: [PATCH bpf-next v8 01/12] bpf: add support for bpf_setsockopt()

Jason Xing <kerneljasonxing@xxxxxxxxx> · Wed, 5 Feb 2025 23:34:41 +0800

On Wed, Feb 5, 2025 at 11:22 PM Willem de Bruijn
<willemdebruijn.kernel@xxxxxxxxx> wrote:
>
> Jason Xing wrote:
> > Users can write the following code to enable the bpf extension:
> > int flags = SK_BPF_CB_TX_TIMESTAMPING;
> > int opts = SK_BPF_CB_FLAGS;
> > bpf_setsockopt(skops, SOL_SOCKET, opts, &flags, sizeof(flags));
> >
> > Signed-off-by: Jason Xing <kerneljasonxing@xxxxxxxxx>
> > ---
> >  include/net/sock.h             |  3 +++
> >  include/uapi/linux/bpf.h       |  8 ++++++++
> >  net/core/filter.c              | 23 +++++++++++++++++++++++
> >  tools/include/uapi/linux/bpf.h |  1 +
> >  4 files changed, 35 insertions(+)
> >
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 8036b3b79cd8..7916982343c6 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -303,6 +303,7 @@ struct sk_filter;
> >    *  @sk_stamp: time stamp of last packet received
> >    *  @sk_stamp_seq: lock for accessing sk_stamp on 32 bit architectures only
> >    *  @sk_tsflags: SO_TIMESTAMPING flags
> > +  *  @sk_bpf_cb_flags: used in bpf_setsockopt()
> >    *  @sk_use_task_frag: allow sk_page_frag() to use current->task_frag.
> >    *                     Sockets that can be used under memory reclaim should
> >    *                     set this to false.
> > @@ -445,6 +446,8 @@ struct sock {
> >       u32                     sk_reserved_mem;
> >       int                     sk_forward_alloc;
> >       u32                     sk_tsflags;
> > +#define SK_BPF_CB_FLAG_TEST(SK, FLAG) ((SK)->sk_bpf_cb_flags & (FLAG))
> > +     u32                     sk_bpf_cb_flags;
> >       __cacheline_group_end(sock_write_rxtx);
> >
> >       __cacheline_group_begin(sock_write_tx);
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 2acf9b336371..6116eb3d1515 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -6913,6 +6913,13 @@ enum {
> >       BPF_SOCK_OPS_ALL_CB_FLAGS       = 0x7F,
> >  };
> >
> > +/* Definitions for bpf_sk_cb_flags */
> > +enum {
> > +     SK_BPF_CB_TX_TIMESTAMPING       = 1<<0,
> > +     SK_BPF_CB_MASK                  = (SK_BPF_CB_TX_TIMESTAMPING - 1) |
> > +                                        SK_BPF_CB_TX_TIMESTAMPING
> > +};
> > +
> >  /* List of known BPF sock_ops operators.
> >   * New entries can only be added at the end
> >   */
> > @@ -7091,6 +7098,7 @@ enum {
> >       TCP_BPF_SYN_IP          = 1006, /* Copy the IP[46] and TCP header */
> >       TCP_BPF_SYN_MAC         = 1007, /* Copy the MAC, IP[46], and TCP header */
> >       TCP_BPF_SOCK_OPS_CB_FLAGS = 1008, /* Get or Set TCP sock ops flags */
> > +     SK_BPF_CB_FLAGS         = 1009, /* Used to set socket bpf flags */
> >  };
> >
> >  enum {
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 2ec162dd83c4..1c6c07507a78 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5222,6 +5222,25 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
> >       .arg1_type      = ARG_PTR_TO_CTX,
> >  };
> >
> > +static int sk_bpf_set_get_cb_flags(struct sock *sk, char *optval, bool getopt)
> > +{
> > +     u32 sk_bpf_cb_flags;
> > +
> > +     if (getopt) {
> > +             *(u32 *)optval = sk->sk_bpf_cb_flags;
> > +             return 0;
> > +     }
> > +
> > +     sk_bpf_cb_flags = *(u32 *)optval;
> > +
> > +     if (sk_bpf_cb_flags & ~SK_BPF_CB_MASK)
> > +             return -EINVAL;
> > +
> > +     sk->sk_bpf_cb_flags = sk_bpf_cb_flags;
>
> I don't know BPF internals that well:
>
> Is there mutual exclusion between these sol_socket_sockopt calls?
> Or do these sk field accesses need WRITE_ONCE/READ_ONCE.

According to the existing callbacks (like
BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB which I used in the selftests) in
include/uapi/linux/bpf.h, they are under the socket lock protection.
And the correct use of this feature is to set during the 3-way
handshake that also is protected by lock. But after you remind me of
this potential data race issue, just in case bpf program doesn't use
it as we expect, I think I will add the this annotation in v9.

Thanks,
Jason