On 10/6/23 2:14 AM, Wenjia Zhang wrote:
On 26.09.23 11:06, D. Wythe wrote:
On 9/26/23 3:18 PM, Alexandra Winter wrote:
On 26.09.23 05:00, D. Wythe wrote:
You are right. The key point is how to ensure the valid of smc sock
during the life time of clc sock, If so, READ_ONCE is good
enough. Unfortunately, I found that there are no such guarantee,
so it's still a life-time problem.
Did you discover a scenario, where clc sock could live longer than
smc sock?
Wouldn't that be a dangerous scenario in itself? I still have some
hope that the lifetime of an smc socket is by design longer
than that of the corresponding tcp socket.
Hi Alexandra,
Yes there is. Considering scenario:
tcp_v4_rcv(skb)
/* req sock */
reqsk = _inet_lookup_skb(skb)
/* listen sock */
sk = reqsk(reqsk)->rsk_listener;
sock_hold(sk);
tcp_check_req(sk)
smc_release /*
release smc listen sock */
__smc_release
smc_close_active() /* smc_sk->sk_state = SMC_CLOSED; */
if
(smc_sk->sk_state == SMC_CLOSED)
smc_clcsock_release();
sock_release(clcsk); /* close clcsock */
sock_put(sk); /* might not the final refcnt */
sock_put(smc_sk) /* might be the final refcnt of smc_sock */
syn_recv_sock(sk...)
/* might be the final refcnt of tcp listen sock */
sock_put(sk);
Fortunately, this scenario only affects smc_syn_recv_sock and
smc_hs_congested, as other callbacks already have locks to protect smc,
which can guarantee that the sk_user_data is either NULL (set in
smc_close_active) or valid under the lock.
I'm kind of confused with this scenario. How could the
smc_clcsock_release()->sock_release(clcsk) happen?
Because the syn_recv_sock happens short prior to accept(), that means
that the &smc->tcp_listen_work is already triggered but the real
accept() is still not happening. At this moment, the incoming
connection is being added into the accept queue. Thus, if the
sk->sk_state is changed from SMC_LISTEN to SMC_CLOSED in
smc_close_active(), there is still
"flush_work(&smc->tcp_listen_work);" after that. That ensures the
smc_clcsock_release() should not happen, if smc_clcsock_accept() is
not finished. Do you think that the execution of the
&smc->tcp_listen_work is already done? Or am I missing something?
Hi wenjia,
Sorry for late reply, we have just returned from vacation.
The smc_clcsock_release here release the listen clcsock rather than the
child clcsock.
So the flush_work might not be helpful for this scenario.
Best wishes,
D. Wythe
Considering the const, maybe
we need to do :
1. hold a refcnt of smc_sock for syn_recv_sock to keep smc sock
valid during life time of clc sock
2. put the refcnt of smc_sock in sk_destruct in tcp_sock to release
the very smc sock .
In that way, we can always make sure the valid of smc sock during
the life time of clc sock. Then we can use READ_ONCE rather
than lock. What do you think ?
I am not sure I fully understand the details what you propose to do.
And it is not only syn_recv_sock(), right?
You need to consider all relations between smc socks and tcp socks;
fallback to tcp, initial creation, children of listen sockets,
variants of shutdown, ... Preferrably a single simple mechanism
covers all situations. Maybe there is such a mechanism already today?
(I don't think clcsock->sk->sk_user_data or sk_callback_lock provide
this general coverage)
If we really have a gap, a general refcnt'ing on smc sock could be a
solution, but needs to be designed carefully.
You are right , we need designed it with care, we will try the
referenced solutions internally first, and I will also send some RFCs
so that everyone can track the latest progress
and make it can be all agreed.
Many thanks to you and the team to help make smc more stable and
robust.
Our pleasure 😁. The stability of smc is important to us too.
Best wishes,
D. Wythe