I was looking at this patch in more detail, and it does look important but I wanted to clarify a few things. In your detailed description you mention that the retry on port 139 is missing a call put_net(0 > ip_connect > generic_ip_connect /* Try port 445 */ > get_net() > ->connect() /* Failed */ > put_net() > generic_ip_connect /* Try port 139 */ > get_net() /* Missing matching put_net() for this get_net().*/ but I found this confusing because generic_ip_connect() doesn't seem to treat the port 445 vs. port 139 differently (there are only two places the function does put_net() and the latter on line 3421 looks like the only one that matters for your example). Here is the snippet from generic_ip_connect(). Could you explain why the retry on port 139 example is different here? rc = kernel_connect(socket, saddr, slen, server->noblockcnt ? O_NONBLOCK : 0); /* * When mounting SMB root file systems, we do not want to block in * connect. Otherwise bail out and then let cifs_reconnect() perform * reconnect failover - if possible. */ if (server->noblockcnt && rc == -EINPROGRESS) rc = 0; if (rc < 0) { cifs_dbg(FYI, "Error %d connecting to server\n", rc); trace_smb3_connect_err(server->hostname, server->conn_id, &server->dstaddr, rc); put_net(cifs_net_ns(server)); sock_release(socket); server->ssocket = NULL; return rc; } On Tue, Feb 18, 2025 at 8:34 AM Wang Zhaolong <wangzhaolong1@xxxxxxxxxx> wrote: > > Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network > namespace.") attempted to fix a netns use-after-free issue by manually > adjusting reference counts via sk->sk_net_refcnt and sock_inuse_add(). > > However, a later commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock > after rmmod") pointed out that the approach of manually setting > sk->sk_net_refcnt in the first commit was technically incorrect, as > sk->sk_net_refcnt should only be set for user sockets. It led to issues > like TCP timers not being cleared properly on close. The second commit > moved to a model of just holding an extra netns reference for > server->ssocket using get_net(), and dropping it when the server is torn > down. > > But there remain some gaps in the get_net()/put_net() balancing added by > these commits. The incomplete reference handling in these fixes results > in two issues: > > 1. Netns refcount leaks[1] > > The problem process is as follows: > > ``` > mount.cifs cifsd > > cifs_do_mount > cifs_mount > cifs_mount_get_session > cifs_get_tcp_session > get_net() /* First get net. */ > ip_connect > generic_ip_connect /* Try port 445 */ > get_net() > ->connect() /* Failed */ > put_net() > generic_ip_connect /* Try port 139 */ > get_net() /* Missing matching put_net() for this get_net().*/ > cifs_get_smb_ses > cifs_negotiate_protocol > smb2_negotiate > SMB2_negotiate > cifs_send_recv > wait_for_response > cifs_demultiplex_thread > cifs_read_from_socket > cifs_readv_from_socket > cifs_reconnect > cifs_abort_connection > sock_release(); > server->ssocket = NULL; > /* Missing put_net() here. */ > generic_ip_connect > get_net() > ->connect() /* Failed */ > put_net() > sock_release(); > server->ssocket = NULL; > free_rsp_buf > ... > clean_demultiplex_info > /* It's only called once here. */ > put_net() > ``` > > When cifs_reconnect() is triggered, the server->ssocket is released > without a corresponding put_net() for the reference acquired in > generic_ip_connect() before. it ends up calling generic_ip_connect() > again to retry get_net(). After that, server->ssocket is set to NULL > in the error path of generic_ip_connect(), and the net count cannot be > released in the final clean_demultiplex_info() function. > > 2. Potential use-after-free > > The current refcounting scheme can lead to a potential use-after-free issue > in the following scenario: > > ``` > cifs_do_mount > cifs_mount > cifs_mount_get_session > cifs_get_tcp_session > get_net() /* First get net */ > ip_connect > generic_ip_connect > get_net() > bind_socket > kernel_bind /* failed */ > put_net() > /* after out_err_crypto_release label */ > put_net() > /* after out_err label */ > put_net() > ``` > > In the exception handling process where binding the socket fails, the > get_net() and put_net() calls are unbalanced, which may cause the > server->net reference count to drop to zero and be prematurely released. > > To address both issues, this patch ties the netns reference counting to > the server->ssocket and server lifecycles. The extra reference is now > acquired when the server or socket is created, and released when the > socket is destroyed or the server is torn down. > > [1]: https://bugzilla.kernel.org/show_bug.cgi?id=219792 > > Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") > Fixes: e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod") > Signed-off-by: Wang Zhaolong <wangzhaolong1@xxxxxxxxxx> > --- > fs/smb/client/connect.c | 16 ++++++++-------- > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c > index f917de020dd5..0d454149f3b4 100644 > --- a/fs/smb/client/connect.c > +++ b/fs/smb/client/connect.c > @@ -300,6 +300,7 @@ cifs_abort_connection(struct TCP_Server_Info *server) > server->ssocket->flags); > sock_release(server->ssocket); > server->ssocket = NULL; > + put_net(cifs_net_ns(server)); > } > server->sequence_number = 0; > server->session_estab = false; > @@ -3115,8 +3116,12 @@ generic_ip_connect(struct TCP_Server_Info *server) > /* > * Grab netns reference for the socket. > * > - * It'll be released here, on error, or in clean_demultiplex_info() upon server > - * teardown. > + * This reference will be released in several situations: > + * - In the failure path before the cifsd thread is started. > + * - In the all place where server->socket is released, it is > + * also set to NULL. > + * - Ultimately in clean_demultiplex_info(), during the final > + * teardown. > */ > get_net(net); > > @@ -3132,10 +3137,8 @@ generic_ip_connect(struct TCP_Server_Info *server) > } > > rc = bind_socket(server); > - if (rc < 0) { > - put_net(cifs_net_ns(server)); > + if (rc < 0) > return rc; > - } > > /* > * Eventually check for other socket options to change from > @@ -3181,9 +3184,6 @@ generic_ip_connect(struct TCP_Server_Info *server) > if (sport == htons(RFC1001_PORT)) > rc = ip_rfc1001_connect(server); > > - if (rc < 0) > - put_net(cifs_net_ns(server)); > - > return rc; > } > > -- > 2.34.3 > > -- Thanks, Steve