I've discovered a bug in
serverloop.c(function=wait_until_can_do_something) for which I believe
that it wasn't reported so far.
With latest openssh (7.8p1 as well as current master) sshd disconnects a
non-responding client after approximately: (ClientAliveCountMax / 2) *
ClientAliveInterval
I did a bisect which showed that the fix introduced for bz#2756 causes
this behavior: https://bugzilla.mindrot.org/show_bug.cgi?id=2756
How to reproduce:
1. server #> /sbin/sshd -p 2020 -ddd -f ${sshd_config} 2>&1 | ts
2. client $> ssh $IP -p2020
3. close e.g. notebook of the client connection and wait for the
timeout to happen
${sshd_config}
----
TCPKeepAlive no
ClientAliveInterval 15
ClientAliveCountMax 8
----
The debug log of sshd shows:
----
...
[2018-04-26 11:59:35] debug3: /tmp/sshd_config:94 setting TCPKeepAlive
no
[2018-04-26 11:59:35] debug3: /tmp/sshd_config:98 setting
ClientAliveInterval 15
[2018-04-26 11:59:35] debug3: /tmp/sshd_config:99 setting
ClientAliveCountMax 8
...
[2018-04-26 12:00:16] debug2: channel 0: request keepalive@xxxxxxxxxxx
confirm 1
[2018-04-26 12:00:16] debug2: channel 0: request keepalive@xxxxxxxxxxx
confirm 1
[2018-04-26 12:00:31] debug2: channel 0: request keepalive@xxxxxxxxxxx
confirm 1
[2018-04-26 12:00:31] debug2: channel 0: request keepalive@xxxxxxxxxxx
confirm 1
[2018-04-26 12:00:46] debug2: channel 0: request keepalive@xxxxxxxxxxx
confirm 1
[2018-04-26 12:00:46] debug2: channel 0: request keepalive@xxxxxxxxxxx
confirm 1
[2018-04-26 12:01:01] debug2: channel 0: request keepalive@xxxxxxxxxxx
confirm 1
[2018-04-26 12:01:01] debug2: channel 0: request keepalive@xxxxxxxxxxx
confirm 1
[2018-04-26 12:01:16] Timeout, client not responding from user $USER
x.x.x.x port xxxxx
----
As we can see, keepalive packets are sent twice on every interval. I
think the problem is that if a timeout of the select call in
function=wait_until_can_do_something happens the
variable=last_client_time isn't set to current time and during the next
iteration the select call returns immediately with data contained in
'writesetp'.
A possible fix for which I believe doesn't break the fix of bz#2756 and
solves this problem could be:
----
diff --git a/serverloop.c b/serverloop.c
index d71724e..7110bf6 100644
--- a/serverloop.c
+++ b/serverloop.c
@@ -290,6 +290,7 @@ wait_until_can_do_something(struct ssh *ssh,
if (ret == 0) { /* timeout */
client_alive_check(ssh);
+ last_client_time = now;
} else if (FD_ISSET(connection_in, *readsetp)) {
last_client_time = now;
} else if (last_client_time != 0 && last_client_time +
----
This solves the problem for me. Can someone confirm that this is a bug
and apply either my proposed fix or any other which solves this problem?
If this gets confirmed, shall I open a bugzilla ticket or isn't it
necessary?
Thanks,
Samuel
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@xxxxxxxxxxx
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev