deadlock in cifs mounts when server connection goes down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Greetings:

We have a product that uses cifs mounts to catalog digital music on
cifs/smb shares on our user's network. We have noticed that while
cataloging, if a computer leaves the network while the share is
mounted, any processes that attempt access while be held in the 'D'
state (which is expected); however, there doesn't appear to be any
time-out to give up trying to reconnect to the missing server.

Upon examining the source, I believe the issue is in this loop in
fs/cifs/connect.c:186 (inside cifs_reconnect() proc)

   while ((server->tcpStatus != CifsExiting) &&
          (server->tcpStatus != CifsGood)) {
      try_to_freeze();
      if (server->addr.sockAddr6.sin6_family == AF_INET6)
         rc = ipv6_connect(server);
      else
         rc = ipv4_connect(server);
      if (rc) {
         cFYI(1, "reconnect error %d", rc);
         msleep(3000);
      } else {
         atomic_inc(&tcpSesReconnectCount);
         spin_lock(&GlobalMid_Lock);
         if (server->tcpStatus != CifsExiting)
            server->tcpStatus = CifsGood;
         server->sequence_number = 0;
         spin_unlock(&GlobalMid_Lock);
   /*    atomic_set(&server->inFlight,0);*/
         wake_up(&server->response_q);
      }
   }

It looks like if the server leaves the network, this loop will never
exit because the loop condition never becomes false.

While I was looking at the callers of cifs_reconnect, I also noticed
that the return code is consistently ignored which points to the
assumption that cifs_reconnect() only returns when the connection is
again viable.

At first glance, the most direct route would be to add a retry counter
to the loop and then exit out if not successful within (user
configurable?) attempts; however, I am not familiar enough with the
cifs code-base to anticipate any regressions that might result from
that.

Any thoughts/comments/suggestions on supplying a fail-safe break out
of this loop? I'm guessing if we can't reconnect then we should treat
this connection as unmounted and fail out of any syscalls, right?

Best regards,
David

--
David Kondrad
Software Design Engineer
Home Systems Division
Legrand, North America

david.kondrad@xxxxxxxxxx
www.legrand.us/onq

PS: I have no control over the forced footer appended to my emails

This email, and any document attached hereto, may contain
confidential and/or privileged information.  If you are not the
intended recipient (or have received this email in error) please
notify the sender immediately and destroy this email.  Any
unauthorized, direct or indirect, copying, disclosure, distribution
or other use of the material or parts thereof is strictly
forbidden.
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux