Any feedback on whether this patch makes sense or not? ;-) ---8<---------------------------------------------------------------------- >From 54e3c95a4646c8666c6f08766250fd056b06e7f5 Mon Sep 17 00:00:00 2001 From: Federico Sauter <fsauter@xxxxxxxxxxxxxx> Date: Tue, 17 Mar 2015 17:45:28 +0100 Subject: [PATCH] CIFS: Fix race condition on RFC1002_NEGATIVE_SESSION_RESPONSE This patch fixes a race condition that occurs when connecting to a NT 3.51 host without specifying a NetBIOS name. In that case a RFC1002_NEGATIVE_SESSION_RESPONSE is received and the SMB negotiation is reattempted, but under some conditions it leads SendReceive() to hang forever while waiting for srv_mutex. This, in turn, sets the calling process to an uninterruptible sleep state and makes it unkillable. The solution is to unlock the srv_mutex acquired in the demux thread *before* going to sleep (after the reconnect error) and before reattempting the connection. --- fs/cifs/connect.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index d05a300..a45e7fc 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -381,6 +381,7 @@ cifs_reconnect(struct TCP_Server_Info *server) rc = generic_ip_connect(server); if (rc) { cifs_dbg(FYI, "reconnect error %d\n", rc); + mutex_unlock(&server->srv_mutex); msleep(3000); } else { atomic_inc(&tcpSesReconnectCount); @@ -388,8 +389,8 @@ cifs_reconnect(struct TCP_Server_Info *server) if (server->tcpStatus != CifsExiting) server->tcpStatus = CifsNeedNegotiate; spin_unlock(&GlobalMid_Lock); + mutex_unlock(&server->srv_mutex); } - mutex_unlock(&server->srv_mutex); } while (server->tcpStatus == CifsNeedReconnect); return rc; -- 1.7.10.4 ---------------------------------------------------------------------->8--- On 03/17/2015 06:13 PM, Federico Sauter wrote:
Greetings, I have been running into an issue (kernel v3.10.40) when connecting to an NT 3.51 Workstation host with a configuration missing the NetBIOS name ('servern' option.) Under some conditions, the mount helper would hang forever in an uninterruptible sleep state. The mount helper comes from busybox. Under some other conditions the mount program would exit with an error and not hang. I managed to create a setup where I could reliably reproduce both cases (the "good case" where the program exits, and the "bad case" where the program hangs forever.) The problem seems to be a race condition between the demux thread and the "main"(?) thread over the srv_mutex. Here is the summary of the functions calls that lead to this problem: * demux thread: cifs_demultiplex_thread() is_smb_response() [connect.c:626 -- case RFC1002_NEGATIVE_SESSION_RESPONSE] cifs_reconnect() [connect.c:380] do { mutex_lock(&server->srv_mutex); generic_ip_connect(server); // on error -> msleep(3000); mutex_unlock(&server->srv_mutex); } while (server->tcpStatus == CifsNeedReconnect); * "main" thread: cifs_negotiate() CIFSSMBNegotiate() SendReceive() [transport.c:821 - thread hangs forever] mutex_lock(&ses->server->srv_mutex); Another interesting piece of information is that in the good case at cifs_reconnect(), generic_ip_connect() returns -EINTR, whereas in the bad case it returns -ECONNREFUSED. In the bad case it all leads to generic_ip_connect() being called over and over again with the same result, but never exiting the loop (thus: hanging.) The following patch works around the issue by not re-attempting the SMB negotiation: ---8<---------------------------------------------------------------------- diff --git a/fs/cifs/smb1ops.c b/fs/cifs/smb1ops.c index 4885a40..863a2da 100644 --- a/fs/cifs/smb1ops.c +++ b/fs/cifs/smb1ops.c @@ -415,10 +415,12 @@ cifs_negotiate(const unsigned int xid, struct cifs_ses *ses) int rc; rc = CIFSSMBNegotiate(xid, ses); if (rc == -EAGAIN) { +#if 0 /* retry only once on 1st time connection */ set_credits(ses->server, 1); rc = CIFSSMBNegotiate(xid, ses); if (rc == -EAGAIN) +#endif rc = -EHOSTDOWN; } return rc; ---------------------------------------------------------------------->8--- I was able, however, to identify a (hopefully) better solution for the issue (see the attached patch.) I would really appreciate your feedback on the attached patch. Please let me know if the solution seems acceptable as well as side-effects-free. We use CIFS to connect to older Windows systems and we have been experiencing similar issues for a while now (which I hope to solve with this patch.) Thanks a lot in advance! :-) Kind regards, Federico Sauter Senior Firmware Programmer
-- To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html