On 2/5/25 7:34 AM, Jason Xing wrote:
On Wed, Feb 5, 2025 at 11:22 PM Willem de Bruijn
<willemdebruijn.kernel@xxxxxxxxx> wrote:
Jason Xing wrote:
Users can write the following code to enable the bpf extension:
int flags = SK_BPF_CB_TX_TIMESTAMPING;
int opts = SK_BPF_CB_FLAGS;
bpf_setsockopt(skops, SOL_SOCKET, opts, &flags, sizeof(flags));
Signed-off-by: Jason Xing <kerneljasonxing@xxxxxxxxx>
---
include/net/sock.h | 3 +++
include/uapi/linux/bpf.h | 8 ++++++++
net/core/filter.c | 23 +++++++++++++++++++++++
tools/include/uapi/linux/bpf.h | 1 +
4 files changed, 35 insertions(+)
diff --git a/include/net/sock.h b/include/net/sock.h
index 8036b3b79cd8..7916982343c6 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -303,6 +303,7 @@ struct sk_filter;
* @sk_stamp: time stamp of last packet received
* @sk_stamp_seq: lock for accessing sk_stamp on 32 bit architectures only
* @sk_tsflags: SO_TIMESTAMPING flags
+ * @sk_bpf_cb_flags: used in bpf_setsockopt()
* @sk_use_task_frag: allow sk_page_frag() to use current->task_frag.
* Sockets that can be used under memory reclaim should
* set this to false.
@@ -445,6 +446,8 @@ struct sock {
u32 sk_reserved_mem;
int sk_forward_alloc;
u32 sk_tsflags;
+#define SK_BPF_CB_FLAG_TEST(SK, FLAG) ((SK)->sk_bpf_cb_flags & (FLAG))
+ u32 sk_bpf_cb_flags;
__cacheline_group_end(sock_write_rxtx);
__cacheline_group_begin(sock_write_tx);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2acf9b336371..6116eb3d1515 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6913,6 +6913,13 @@ enum {
BPF_SOCK_OPS_ALL_CB_FLAGS = 0x7F,
};
+/* Definitions for bpf_sk_cb_flags */
+enum {
+ SK_BPF_CB_TX_TIMESTAMPING = 1<<0,
+ SK_BPF_CB_MASK = (SK_BPF_CB_TX_TIMESTAMPING - 1) |
+ SK_BPF_CB_TX_TIMESTAMPING
+};
+
/* List of known BPF sock_ops operators.
* New entries can only be added at the end
*/
@@ -7091,6 +7098,7 @@ enum {
TCP_BPF_SYN_IP = 1006, /* Copy the IP[46] and TCP header */
TCP_BPF_SYN_MAC = 1007, /* Copy the MAC, IP[46], and TCP header */
TCP_BPF_SOCK_OPS_CB_FLAGS = 1008, /* Get or Set TCP sock ops flags */
+ SK_BPF_CB_FLAGS = 1009, /* Used to set socket bpf flags */
};
enum {
diff --git a/net/core/filter.c b/net/core/filter.c
index 2ec162dd83c4..1c6c07507a78 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5222,6 +5222,25 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
.arg1_type = ARG_PTR_TO_CTX,
};
+static int sk_bpf_set_get_cb_flags(struct sock *sk, char *optval, bool getopt)
+{
+ u32 sk_bpf_cb_flags;
+
+ if (getopt) {
+ *(u32 *)optval = sk->sk_bpf_cb_flags;
+ return 0;
+ }
+
+ sk_bpf_cb_flags = *(u32 *)optval;
+
+ if (sk_bpf_cb_flags & ~SK_BPF_CB_MASK)
+ return -EINVAL;
+
+ sk->sk_bpf_cb_flags = sk_bpf_cb_flags;
I don't know BPF internals that well:
Is there mutual exclusion between these sol_socket_sockopt calls?
Yep. There is a sock_owned_by_me() in the earlier code path of sol_socket_sockopt.
Another reader is at patch 11 tcp_tx_timestamp() which should have owned the
sock also.
Or do these sk field accesses need WRITE_ONCE/READ_ONCE.
this potential data race issue, just in case bpf program doesn't use
it as we expect, I think I will add the this annotation in v9.
Jason, it should not have data race issue in sol_socket_sockopt(). bpf program
cannot use the sol_socket_sockopt without a lock. Patch 4 was added exactly to
ensure that.
The situation is similar to the existing tcp_sk(sk)->bpf_sock_ops_cb_flags. It
is also a plain access such that it is clear all read/write is under the
sock_owned_by_me.