Re: [PATCH net] bpf, sockmap: Restore sk_prot ops when psock is removed from sockmap

"dongchenchen (A)" <dongchenchen2@xxxxxxxxxx> · Fri, 7 Mar 2025 21:18:14 +0800

On 2025/3/6 13:06, Jiayuan Chen wrote:
On Wed, Mar 05, 2025 at 10:12:43AM +0800, Cong Wang wrote:
On Wed, Mar 05, 2025 at 10:02:34PM +0800, Dong Chenchen wrote:
WARNING: CPU: 0 PID: 6558 at net/core/sock_map.c:1703 sock_map_close+0x3c4/0x480
Modules linked in:
CPU: 0 UID: 0 PID: 6558 Comm: syz-executor.14 Not tainted 6.14.0-rc5+ #238
RIP: 0010:sock_map_close+0x3c4/0x480
Call Trace:
  <TASK>
  inet_release+0x144/0x280
  __sock_release+0xb8/0x270
  sock_close+0x1e/0x30
  __fput+0x3c6/0xb30
  __fput_sync+0x7b/0x90
  __x64_sys_close+0x90/0x120
  do_syscall_64+0x5d/0x170
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

The root cause is:
sock_hash_update_common
   sock_map_unref
     sock_map_del_link
       psock->psock_update_sk_prot(sk, psock, false);
	//false won't restore proto
     sk_psock_put
        rcu_assign_sk_user_data(sk, NULL);
inet_release
   sk->sk_prot->close
     sock_map_close
       WARN(sk->sk_prot->close == sock_map_close)

When psock is removed from sockmap, sock_map_del_link() still set
sk->sk_prot to bpf proto instead of restore it (for incorrect restore
value). sock release will triger warning of sock_map_close() for
recurse after psock drop.
But sk_psock_drop() restores it with sk_psock_restore_proto() after the
psock reference count goes to zero. So how could the above happen?

By the way, it would be perfect if you could add a test case for it
together with this patch (a followup patch is fine too).

Thanks!
I also have the same question as Cong, and I'll describe it in more detail
here:

'psock->saved_close' is always tcp_close (if your socket is TCP) and will
not change regardless of whether restore is executed or not. So when
entering the function sock_map_close() and encountering
WARN_ON_ONCE(saved_close == sock_map_close), 'saved_close' can only come
from 'saved_close = READ_ONCE(sk->sk_prot)->close'. This means we obtain
sock through psock = sk_psock(sk) and then enter the branch code after
judging it to be null.
'''
sock_map_close()
{
	psock = sk_psock(sk);
	if (likely(psock)) {
		saved_close = psock->saved_close;
	} else {
		saved_close = READ_ONCE(sk->sk_prot)->close;
	}
	WARN_ON_ONCE(saved_close == sock_map_close);
}
'''
However, before psock becomes null, we have actually successfully executed
the restore:
'''
void sk_psock_drop(struct sock *sk, struct sk_psock *psock)
{
     write_lock_bh(&sk->sk_callback_lock);
     sk_psock_restore_proto(sk, psock); // restore correctly
     rcu_assign_sk_user_data(sk, NULL); // set psock null
    ...
}
'''

Passing false to psock_update_sk_prot may be problematic, but it shouldn't
cause the issue described in the commit message.
It may be necessary to provide more information on how sockmap is used to
determine the issue. :)

Thanks.
Thanks for your review!
sk_psock_restore_protofail to restore sk_prot for tcp_ulp set concurrently.

sock_hash_update_common
  sock_map_unref
    sock_map_del_link
                              setsockopt(TCP_ULP)
                                tcp_set_ulp
                                  icsk->icsk_ulp_ops = ulp_ops; //set ulp
    sk_psock_put
      sk_psock_restore_proto
        tcp_bpf_update_proto
          if (inet_csk_has_ulp(sk))
             tls_update
               ctx->sk_proto = p; //not update sk->sk_prot for ulp set
inet_release
  sk->sk_prot->close
    sock_map_close
      WARN(sk->sk_prot->close == sock_map_close)

Maybe we can fix it as:

diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 99ca4465f702..791cc32dccfd 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -1025,15 +1025,22 @@ static int tls_init(struct sock *sk)
 static void tls_update(struct sock *sk, struct proto *p,
                       void (*write_space)(struct sock *sk))
 {
+       struct sk_psock *psock = sk_psock(sk);
        struct tls_context *ctx;
+       bool restore = true;

        WARN_ON_ONCE(sk->sk_prot == p);

+       if (psock)
+               restore = !!refcount_read(&psock->refcnt);
+
        ctx = tls_get_ctx(sk);
        if (likely(ctx)) {
                ctx->sk_write_space = write_space;
                ctx->sk_proto = p;
-       } else {
+       }
+
+       if (ctx == NULL || !restore) {
                /* Pairs with lockless read in sk_clone_lock(). */
                WRITE_ONCE(sk->sk_prot, p);
                sk->sk_write_space = write_space;