We've noticed a situation where an unstable TCP connection can cause the TLS handshake to timeout waiting for userspace to complete it. When this happens, we don't want to return from xs_tls_handshake_sync() with zero, as this will cause the upper xprt to be set CONNECTED, and subsequent attempts to transmit will be returned with -EPIPE. The sunrpc machine does not recover from this situation and will spin attempting to transmit. The return value of tls_handshake_cancel() can be used to detect a race with completion: * tls_handshake_cancel - cancel a pending handshake * Return values: * %true - Uncompleted handshake request was canceled * %false - Handshake request already completed or not found If true, we do not want the upper xprt to be connected, so return -ETIMEDOUT. If false, its possible the handshake request was lost and that may be the reason for our timeout. Again we do not want the upper xprt to be connected, so return -ETIMEDOUT. Ensure that we alway return an error from xs_tls_handshake_sync() if we call tls_handshake_cancel(). Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx> --- net/sunrpc/xprtsock.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 1326fbf45a34..95161a8cd138 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2614,11 +2614,10 @@ static int xs_tls_handshake_sync(struct rpc_xprt *lower_xprt, struct xprtsec_par rc = wait_for_completion_interruptible_timeout(&lower_transport->handshake_done, XS_TLS_HANDSHAKE_TO); if (rc <= 0) { - if (!tls_handshake_cancel(sk)) { - if (rc == 0) - rc = -ETIMEDOUT; - goto out_put_xprt; - } + tls_handshake_cancel(sk); + if (rc == 0) + rc = -ETIMEDOUT; + goto out_put_xprt; } rc = lower_transport->xprt_err; base-commit: a9cda7c0ffedb47b23002e109bd26ab2a2ab99c9 -- 2.47.0