sshd: ClientAlive{CountMax,Interval} fires twice each interval if connection is interrupted

dev@xxxxxxxxxxx · Wed, 26 Sep 2018 13:09:57 +0200

I've discovered a bug in 
serverloop.c(function=wait_until_can_do_something) for which I believe 
that it wasn't reported so far.
With latest openssh (7.8p1 as well as current master) sshd disconnects a 
non-responding client after approximately: (ClientAliveCountMax / 2) * 
ClientAliveInterval

I did a bisect which showed that the fix introduced for bz#2756 causes 
this behavior: https://bugzilla.mindrot.org/show_bug.cgi?id=2756

How to reproduce:
  1. server #> /sbin/sshd -p 2020 -ddd -f ${sshd_config} 2>&1 | ts
  2. client $> ssh $IP -p2020
  3. close e.g. notebook of the client connection and wait for the 
timeout to happen

${sshd_config}
----
TCPKeepAlive no
ClientAliveInterval 15
ClientAliveCountMax 8
----

The debug log of sshd shows:
----
...
[2018-04-26 11:59:35] debug3: /tmp/sshd_config:94 setting TCPKeepAlive 
no
[2018-04-26 11:59:35] debug3: /tmp/sshd_config:98 setting 
ClientAliveInterval 15
[2018-04-26 11:59:35] debug3: /tmp/sshd_config:99 setting 
ClientAliveCountMax 8
...
[2018-04-26 12:00:16] debug2: channel 0: request keepalive@xxxxxxxxxxx 
confirm 1
[2018-04-26 12:00:16] debug2: channel 0: request keepalive@xxxxxxxxxxx 
confirm 1
[2018-04-26 12:00:31] debug2: channel 0: request keepalive@xxxxxxxxxxx 
confirm 1
[2018-04-26 12:00:31] debug2: channel 0: request keepalive@xxxxxxxxxxx 
confirm 1
[2018-04-26 12:00:46] debug2: channel 0: request keepalive@xxxxxxxxxxx 
confirm 1
[2018-04-26 12:00:46] debug2: channel 0: request keepalive@xxxxxxxxxxx 
confirm 1
[2018-04-26 12:01:01] debug2: channel 0: request keepalive@xxxxxxxxxxx 
confirm 1
[2018-04-26 12:01:01] debug2: channel 0: request keepalive@xxxxxxxxxxx 
confirm 1
[2018-04-26 12:01:16] Timeout, client not responding from user $USER 
x.x.x.x port xxxxx
----

As we can see, keepalive packets are sent twice on every interval. I 
think the problem is that if a timeout of the select call in 
function=wait_until_can_do_something happens the 
variable=last_client_time isn't set to current time and during the next 
iteration the select call returns immediately with data contained in 
'writesetp'.

A possible fix for which I believe doesn't break the fix of bz#2756 and 
solves this problem could be:
----

diff --git a/serverloop.c b/serverloop.c
index d71724e..7110bf6 100644
--- a/serverloop.c
+++ b/serverloop.c
@@ -290,6 +290,7 @@ wait_until_can_do_something(struct ssh *ssh,

                if (ret == 0) { /* timeout */
                        client_alive_check(ssh);
+                       last_client_time = now;
                } else if (FD_ISSET(connection_in, *readsetp)) {
                        last_client_time = now;
                } else if (last_client_time != 0 && last_client_time +
----

This solves the problem for me. Can someone confirm that this is a bug 
and apply either my proposed fix or any other which solves this problem?
If this gets confirmed, shall I open a bugzilla ticket or isn't it 
necessary?

Thanks,
Samuel
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@xxxxxxxxxxx
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev