On 9/27/19 11:17 AM, Martin Lau wrote: > On Fri, Sep 27, 2019 at 10:24:49AM -0700, Eric Dumazet wrote: >> >> >> On 9/27/19 9:52 AM, Martin KaFai Lau wrote: >>> In reuseport_array_free(), the rcu_read_lock() cannot ensure sk is still >>> valid. It is because bpf_sk_reuseport_detach() can be called from >>> __sk_destruct() which is invoked through call_rcu(..., __sk_destruct). >> >> We could question why reuseport_detach_sock(sk) is called from __sk_destruct() >> (after the rcu grace period) instead of sk_destruct() ? > Agree. It is another way to fix it. > > In this patch, I chose to avoid the need to single out a special treatment for > reuseport_detach_sock() in sk_destruct(). > > I am happy either way. What do you think? It seems that since we call reuseport_detach_sock() after the rcu grace period, another cpu could catch the sk pointer in reuse->socks[] array and use it right before our cpu frees the socket. RCU rules are not properly applied here I think. The rules for deletion are : 1) unpublish object from various lists/arrays/hashes. 2) rcu_grace_period 3) free the object. If we fix the unpublish (we need to anyway to make the data path safe), then your patch is not needed ? What about (totally untested, might be horribly wrong) diff --git a/net/core/sock.c b/net/core/sock.c index 07863edbe6fc4842e47ebebf00bc21bc406d9264..d31a4b094797f73ef89110c954aa0a164879362d 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1700,8 +1700,6 @@ static void __sk_destruct(struct rcu_head *head) sk_filter_uncharge(sk, filter); RCU_INIT_POINTER(sk->sk_filter, NULL); } - if (rcu_access_pointer(sk->sk_reuseport_cb)) - reuseport_detach_sock(sk); sock_disable_timestamp(sk, SK_FLAGS_TIMESTAMP); @@ -1728,7 +1726,13 @@ static void __sk_destruct(struct rcu_head *head) void sk_destruct(struct sock *sk) { - if (sock_flag(sk, SOCK_RCU_FREE)) + bool use_call_rcu = sock_flag(sk, SOCK_RCU_FREE); + + if (rcu_access_pointer(sk->sk_reuseport_cb)) { + reuseport_detach_sock(sk); + use_call_rcu = true; + } + if (use_call_rcu) call_rcu(&sk->sk_rcu, __sk_destruct); else __sk_destruct(&sk->sk_rcu);