On 3/27/24 11:50 AM, Alexei Starovoitov wrote:
On Wed, Mar 27, 2024 at 11:33 AM Paolo Abeni <pabeni@xxxxxxxxxx> wrote:
On Wed, 2024-03-27 at 10:00 -0700, Alexei Starovoitov wrote:
On Wed, Mar 27, 2024 at 9:56 AM Paolo Abeni <pabeni@xxxxxxxxxx> wrote:
On Wed, 2024-03-27 at 09:43 -0700, Alexei Starovoitov wrote:
I ffwded bpf tree with the recent net fixes and caught this:
[ 48.386337] WARNING: CPU: 32 PID: 3276 at net/mptcp/subflow.c:1430
subflow_data_ready+0x147/0x1c0
[ 48.392012] Modules linked in: dummy bpf_testmod(O) [last unloaded:
bpf_test_no_cfi(O)]
[ 48.396609] CPU: 32 PID: 3276 Comm: test_progs Tainted: G
O 6.8.0-12873-g2c43c33bfd23 #1014
#[ 48.467143] Call Trace:
[ 48.469094] <TASK>
[ 48.472159] ? __warn+0x80/0x180
[ 48.475019] ? subflow_data_ready+0x147/0x1c0
[ 48.478068] ? report_bug+0x189/0x1c0
[ 48.480725] ? handle_bug+0x36/0x70
[ 48.483061] ? exc_invalid_op+0x13/0x60
[ 48.485809] ? asm_exc_invalid_op+0x16/0x20
[ 48.488754] ? subflow_data_ready+0x147/0x1c0
[ 48.492159] mptcp_set_rcvlowat+0x79/0x1d0
[ 48.495026] sk_setsockopt+0x6c0/0x1540
It doesn't reproduce all the time though.
Some race?
Known issue?
It was not known to me. Looks like something related to not so recent
changes (rcvlowat support).
Definitely looks lie a race.
If you could share more info about the running context and/or a full
decoded splat it could help, thanks!
This is just running bpf selftests in parallel:
test_progs -j
The end of the splat:
[ 48.500075] __bpf_setsockopt+0x6f/0x90
[ 48.503124] bpf_sock_ops_setsockopt+0x3c/0x90
[ 48.506053] bpf_prog_509ce5db2c7f9981_bpf_test_sockopt_int+0xb4/0x11b
[ 48.510178] bpf_prog_dce07e362d941d2b_bpf_test_socket_sockopt+0x12b/0x132
[ 48.515070] bpf_prog_348c9b5faaf10092_skops_sockopt+0x954/0xe86
[ 48.519050] __cgroup_bpf_run_filter_sock_ops+0xbc/0x250
[ 48.523836] tcp_connect+0x879/0x1160
[ 48.527239] ? ktime_get_with_offset+0x8d/0x140
[ 48.531362] tcp_v6_connect+0x50c/0x870
[ 48.534609] ? mptcp_connect+0x129/0x280
[ 48.538483] mptcp_connect+0x129/0x280
[ 48.542436] __inet_stream_connect+0xce/0x370
[ 48.546664] ? rcu_is_watching+0xd/0x40
[ 48.549063] ? lock_release+0x1c4/0x280
[ 48.553497] ? inet_stream_connect+0x22/0x50
[ 48.557289] ? rcu_is_watching+0xd/0x40
[ 48.560430] inet_stream_connect+0x36/0x50
[ 48.563604] bpf_trampoline_6442491565+0x49/0xef
[ 48.567770] ? security_socket_connect+0x34/0x50
[ 48.575400] inet_stream_connect+0x5/0x50
[ 48.577721] __sys_connect+0x63/0x90
[ 48.580189] ? bpf_trace_run2+0xb0/0x1a0
[ 48.583171] ? rcu_is_watching+0xd/0x40
[ 48.585802] ? syscall_trace_enter+0xfb/0x1e0
[ 48.588836] __x64_sys_connect+0x14/0x20
Ouch, it looks bad. BPF should not allow any action on mptcp subflows
that go through sk_socket. They touch the mptcp main socket, which is
_not_ protected by the subflow socket lock.
AFICS currently the relevant set of racing sockopt allowed by bpf boils
down to SO_RCVLOWAT only - sk_setsockopt(SO_RCVLOWAT) will call sk-
sk_socket->ops->set_rcvlowat()
So something like the following (completely untested) should possibly
address the issue at hand, but I think it would be better/safer
completely disable ebpf on mptcp subflows, WDYT?
Thanks,
Paolo
---
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index dcd1c76d2a3b..6e5e64c2cf89 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -1493,6 +1493,9 @@ int mptcp_set_rcvlowat(struct sock *sk, int val)
struct mptcp_subflow_context *subflow;
int space, cap;
+ if (has_current_bpf_ctx())
+ return -EINVAL;
+
Looks fine to me.
Martin,
Do you have any better ideas?
The splat explains the race.
In this case setget_sockopt test happen to run in parallel
with mptcp/bpf test and the former one was TCP connect request
but it was for subflow.
We can disable that callback when tcp flow is a subflow,
but that doesn't feel right.
I am also not sure if we can disable all set/getsockopt for tcp subflows. Not
clear if there is use case that depends on this to setsockopt the subflow.
I think Paolo's targeted fix is cleaner.
I will also go with Paolo's fix. The radius is smaller. I don't have a better idea.
Unrelated, is there a way to tell if a tcp_sock is a subflow? bpf prog can use
it to decide if it wants to setsockopt on a subflow or not.