> -----Original Message----- > From: John Fastabend [mailto:john.fastabend@xxxxxxxxx] > Sent: Friday, October 1, 2021 6:25 AM > To: liujian (CE) <liujian56@xxxxxxxxxx>; john.fastabend@xxxxxxxxx; > daniel@xxxxxxxxxxxxx; jakub@xxxxxxxxxxxxxx; lmb@xxxxxxxxxxxxxx; > edumazet@xxxxxxxxxx; davem@xxxxxxxxxxxxx; yoshfuji@xxxxxxxxxxxxxx; > dsahern@xxxxxxxxxx; kuba@xxxxxxxxxx; ast@xxxxxxxxxx; andrii@xxxxxxxxxx; > kafai@xxxxxx; songliubraving@xxxxxx; yhs@xxxxxx; kpsingh@xxxxxxxxxx; > netdev@xxxxxxxxxxxxxxx; bpf@xxxxxxxxxxxxxxx > Cc: liujian (CE) <liujian56@xxxxxxxxxx> > Subject: RE: [PATCH] tcp_bpf: Fix one concurrency problem in the > tcp_bpf_send_verdict function > > Liu Jian wrote: > > In the following cases: > > We need to redirect the first msg to sock1 and the second msg to sock2. > > The sock lock needs to be released at __SK_REDIRECT and to get another > > sock lock, this will cause the probability that psock->eval is not set > > to __SK_NONE when the second msg comes. > > > > If psock does not set apple bytes, fix this by do the cleanup before > > releasing the sock lock. And keep the original logic in other cases. > > It took me sometime to figure out the above description. Please include a bit > more details here this needs to be backported so we want to be very clear > what the error is and how to trigger it. > > In this case we should list the flow to show how the interleaving of msgs > breaks. > > " > With two Msgs, msgA and msgB and a user doing nonblocking sendmsg calls > (or multiple cores) on a single socket 'sk' we could get the following flow. > > msgA, sk msgB, sk > ----------- --------------- > tcp_bpf_sendmsg() > lock(sk) > psock = sk->psock > tcp_bpf_sendmsg() > lock(sk) ... blocking tcp_bpf_send_verdict if (psock- > >eval == NONE) > psock->eval = sk_psock_msg_verdict > .. > < handle SK_REDIRECT case > > release_sock(sk) < lock dropped so grab here > > ret = tcp_bpf_sendmsg_redir > psock = sk->psock > tcp_bpf_send_verdict > lock_sock(sk) ... blocking on B > if (psock->eval == NONE) <- boom. > psock->eval will have msgA state > > The problem here is we dropped the lock on msgA and grabbed it with msgB. > Now we have old state in psock and importantly psock->eval has not been > cleared. So msgB will run whatever action was done on A and the verdict > program may never see it. > " > > Showing the flow makes it painfully obvious why dropping that lock with old > state is broken. > Thanks a lot for such a detailed example. > > > > > Signed-off-by: Liu Jian <liujian56@xxxxxxxxxx> > > We need a fixes tag as well so we can backport this. I will add it. > > > --- > > net/ipv4/tcp_bpf.c | 12 ++++++++++++ > > 1 file changed, 12 insertions(+) > > > > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index > > d3e9386b493e..02442e43ac4d 100644 > > --- a/net/ipv4/tcp_bpf.c > > +++ b/net/ipv4/tcp_bpf.c > > @@ -232,6 +232,7 @@ static int tcp_bpf_send_verdict(struct sock *sk, > struct sk_psock *psock, > > bool cork = false, enospc = sk_msg_full(msg); > > struct sock *sk_redir; > > u32 tosend, delta = 0; > > + u32 eval = __SK_NONE; > > int ret; > > > > more_data: > > @@ -274,6 +275,12 @@ static int tcp_bpf_send_verdict(struct sock *sk, > struct sk_psock *psock, > > break; > > case __SK_REDIRECT: > > sk_redir = psock->sk_redir; > > + if (!psock->apply_bytes) { > > + /* Clean up before releasing the sock lock. */ > > + eval = psock->eval; > > + psock->eval = __SK_NONE; > > + psock->sk_redir = NULL; > > + } > > We need to move above chunk below sk_msg_apply_bytes() so we account > for the bytes and if we zero apply bytes with this send we clear the psock > state. Otherwise we could have the same issue with stale state on the > boundary where apply bytes is met. > > > sk_msg_apply_bytes(psock, tosend); > > <-- put above chunk here. yes, here looks better. > > > if (psock->cork) { > > cork = true; > > Interestingly, I caught the race with cork state, but missed it with the eval > case. Likely because our program redirected to a single sk. > Yes. > > @@ -281,7 +288,12 @@ static int tcp_bpf_send_verdict(struct sock *sk, > struct sk_psock *psock, > > } > > sk_msg_return(sk, msg, tosend); > > release_sock(sk); > > + > > ret = tcp_bpf_sendmsg_redir(sk_redir, msg, tosend, flags); > > + > > + if (eval == __SK_REDIRECT) > > Is the 'if' needed? we are in this case because eval is SK_REDIRECT. > Need it, because If the "apply bytes" is not zero, i did not change the logic. > > + sock_put(sk_redir); > > + > > lock_sock(sk); > > if (unlikely(ret < 0)) { > > int free = sk_msg_free_nocharge(sk, msg); > > -- > > 2.17.1 > >