On 08/31, Paulo Alcantara wrote:
Call cifs_reconnect() to wake up processes waiting on negotiate protocol to handle the case where server abruptly shut down and had no chance to properly close the socket. Simple reproducer: ssh 192.168.2.100 pkill -STOP smbd mount.cifs //192.168.2.100/test /mnt -o ... [never returns] Cc: Rickard Andersson <rickaran@xxxxxxxx> Signed-off-by: Paulo Alcantara (Red Hat) <pc@xxxxxxxxxxxxx> --- fs/smb/client/connect.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index c1c14274930a..e004b515e321 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -656,6 +656,19 @@ allocate_buffers(struct TCP_Server_Info *server) static bool server_unresponsive(struct TCP_Server_Info *server) { + /* + * If we're in the process of mounting a share or reconnecting a session + * and the server abruptly shut down (e.g. socket wasn't closed properly), + * wait for at least an echo interval (+7s from rcvtimeo) when attempting + * to negotiate protocol. + */ + spin_lock(&server->srv_lock); + if (server->tcpStatus == CifsInNegotiate && + time_after(jiffies, server->lstrp + server->echo_interval)) { + spin_unlock(&server->srv_lock); + cifs_reconnect(server, false); + return true; + } /* * We need to wait 3 echo intervals to make sure we handle such * situations right: @@ -667,7 +680,6 @@ server_unresponsive(struct TCP_Server_Info *server) * 65s kernel_recvmsg times out, and we see that we haven't gotten * a response in >60s. */ - spin_lock(&server->srv_lock); if ((server->tcpStatus == CifsGood || server->tcpStatus == CifsNeedNegotiate) && (!server->ops->can_echo || server->ops->can_echo(server)) &&
Maybe, for extra precaution, also worth adding a check in smb2_reconnect() that could catch other cases: --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -278,6 +278,9 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon, spin_unlock(&server->srv_lock); return -EAGAIN; } + } else if (server->tcpStatus == CifsInNegotiate) { + spin_unlock(&server->srv_lock); + return -EAGAIN; } /* if server is marked for termination, cifsd will cleanup */ FTR we hit this exact same bug a few months ago and fixed downstream with the above check, combined with using wait_event_timeout() in wait_for_response() (using @echo_interval seconds as well for timeout). Never sent upstream because it looked too much like a hack and the bug was only reproducible on our v5.14-based kernel. Anyway, HTH. Cheers, Enzo