Dear OpenSSH Team, We use OpenSSH ControlPath/multiplexed sessions heavily to avoid session handshake overheads in a system of CentOS machines performing remote shell commands. The commands are run at different times and take different amounts of time to complete. We're currently experiencing a failure of sshd which relies on timing to trigger a fault and elicit a failure. The failure is typically a process termination. Depending on the loop used to reproduce a crash, the time required can be between five minutes and one hour (approx). On CentOS, failure messages look like [3], on RHEL, reproduced failure messages look like [4]. Previously RedHat supported and fixed a ~very~ similar issue (see ChangeLog entry[1]), however it appears we still see sshd crashes in our system on both CentOS and RHEL (which we know are compiled from the same source code) when using this feature. We are running the latest RPMS for OpenSSH - 6.6.1p1-25.el7_2.x86_64.rpm on both the client and server hosts. When investigating the issue previously, RedHat documented the method we used to reproduce the failure in some detail[2]. I do swap the while loop described on that page for a different loop which is can elicit a failure sooner. The loop used to produce the error in [4] was [5], while I have also written/used [6],[7],[8]. There is probably an infinite number of possible loops. I tried to reproduce this in OpenBSD last May/June and did not see this failure, however, because of the issues around timing, not seeing a failure does not give me confidence there is no fault present. Is there any guidance available on how this can be resolved? ControlMaster is an important and very good feature of OpenSSH without which our remote execution system would not be nearly as efficient or scalable. It is my intention to open a case with RH but sometimes it is difficult to progress such support cases, I'm hoping I can get some assistance to help get this stabilised from wherever I can find it. Sincerely, Rob Mattson Dr Robert Mattson Product Development 380 St Kilda Road, Melbourne, Victoria, 3004 Australia +61 3 9926 0000 phone robert.mattson@xxxxxxxxxx<mailto:robert.mattson@xxxxxxxxxx> www.c4i.com<http://www.c4i.com/> [1] rpm -q --changelog openssh * Wed Jul 15 2015 Jakub Jelen <jjelen@xxxxxxxxxx<mailto:jjelen@xxxxxxxxxx>> 6.6.1p1-15 + 0.9.3-9 ... - Fix race condition with auditing messages answers (#1240613) [2] - https://access.redhat.com/solutions/1521923 [3] from /var/log/secure Apr 26 10:42:58 host01 sshd[15138]: error: buffer_get_ret: trying to get more bytes 68 than in buffer 2 Apr 26 10:42:58 host01 sshd[15138]: error: buffer_get_string_ret: buffer_get failed Apr 26 10:42:58 host01 sshd[15138]: fatal: buffer_get_string: buffer error <snip /> Apr 26 10:56:58 host01 sshd[15293]: fatal: mm_request_receive: read: bad msg_len 4096768 Apr 26 10:56:58 host01 sshd[15293]: pam_unix(sshd:session): session closed for user targetuser Apr 26 10:56:58 host01 sshd[15296]: fatal: mm_request_receive: read: Connection reset by peer Apr 26 10:56:58 host01 sshd[15296]: fatal: mm_request_send: write: Broken pipe <snip /> Apr 26 10:58:58 host01 sshd[16028]: error: buffer_get_ret: trying to get more bytes 68 than in buffer 53 Apr 26 10:58:58 host01 sshd[16028]: error: buffer_get_string_ret: buffer_get failed Apr 26 10:58:58 host01 sshd[16028]: fatal: buffer_get_string: buffer error Apr 26 10:58:58 host01 sshd[16028]: pam_unix(sshd:session): session closed for user targetuser Apr 26 10:58:58 host01 sshd[16032]: fatal: mm_request_receive: read: Connection reset by peer Apr 26 10:58:58 host01 sshd[16032]: fatal: mm_request_send: write: Broken pipe <snip /> Apr 26 11:07:58 host01 sshd[16149]: error: buffer_get_ret: trying to get more bytes 68 than in buffer 50 Apr 26 11:07:58 host01 sshd[16149]: error: buffer_get_string_ret: buffer_get failed Apr 26 11:07:58 host01 sshd[16149]: fatal: buffer_get_string: buffer error Apr 26 11:07:58 host01 sshd[16149]: pam_unix(sshd:session): session closed for user targetuser Apr 26 11:07:58 host01 sshd[16152]: fatal: mm_request_receive: read: Connection reset by peer Apr 26 11:07:58 host01 sshd[16152]: fatal: mm_request_send: write: Broken pipe <snip /> Apr 26 11:12:58 host01 sshd[17475]: fatal: mm_request_receive: read: bad msg_len 4529920 Apr 26 11:12:58 host01 sshd[17475]: pam_unix(sshd:session): session closed for user targetuser Apr 26 11:12:58 host01 sshd[17501]: fatal: mm_request_receive: read: Connection reset by peer Apr 26 11:12:58 host01 sshd[17501]: fatal: mm_request_send: write: Broken pipe [4] time /usr/sbin/sshd -p 1026 -ddd <snip /> debug2: fd 19 setting O_NONBLOCK debug2: fd 11 setting O_NONBLOCK debug2: fd 22 setting O_NONBLOCK debug2: channel 4: rcvd eof debug2: channel 4: output open -> drain debug2: channel 4: obuf empty debug2: channel 4: close_write debug2: channel 4: output drain -> closed debug2: channel 3: read 664 from efd 20 debug2: channel 4: read 664 from efd 22 debug2: channel 3: rwin 2097152 elen 664 euse 1 debug2: channel 3: sent ext data 664 debug2: channel 4: rwin 2097152 elen 664 euse 1 debug2: channel 4: sent ext data 664 debug3: mm_request_receive entering debug3: monitor_read: checking request 124 buffer_get_ret: trying to get more bytes 68 than in buffer 63 <----******* buffer_get_string_ret: buffer_get failed <----******* buffer_get_string: buffer error <----******* debug1: do_cleanup debug1: PAM: cleanup debug1: PAM: closing session debug1: PAM: deleting credentials debug3: PAM: sshpam_thread_cleanup entering real 8m4.228s user 0m0.506s sys 0m1.884s [root@rhel ~]# debug1: server_input_channel_open: ctype session rchan 12 win 2097152 max 32768 debug1: input_session_request debug1: channel 5: new [server-session] debug1: session_new: session 7 debug1: session_open: channel 5 debug1: session_open: session 7: link with channel 5 debug1: server_input_channel_open: confirm session debug1: server_input_channel_req: channel 5 request env reply 0 debug1: session_by_channel: session 7 channel 5 debug1: session_input_channel_req: session 7 req env debug2: Setting env 0: LANG=en_AU.UTF-8 debug1: server_input_channel_req: channel 5 request exec reply 1 debug1: session_by_channel: session 7 channel 5 debug1: session_input_channel_req: session 7 req exec Starting session: command for joe from ::1 port 41194 debug3: mm_audit_run_command entering command sleep .2 debug3: mm_request_send entering: type 114 mm_request_send: write: Broken pipe debug1: do_cleanup debug3: PAM: sshpam_thread_cleanup entering debug3: mm_request_send entering: type 124 mm_request_send: write: Broken pipe [root@rhel ~]# [5] while true; do runCommand .22 runCommand .2 runCommand .2 runCommand .2 runCommand .2 runCommand .2 sleep .333 done [6] while true; do runCommand 3 sleep .6 runCommand 3 sleep .6 runCommand 3 sleep .6 runCommand 3 sleep .6 runCommand 3 sleep .7 done [7] while true; do runCommand 3 runCommand 1 runCommand 1 runCommand 1 runCommand 1 runCommand 2 runCommand 2 runCommand 2 sleep 1 runCommand 1 sleep 1 done [8] while true; do runCommand 1 runCommand 1 runCommand 1 runCommand 1 sleep 1 runCommand 1 sleep 1 done _______________________________________________ openssh-unix-dev mailing list openssh-unix-dev@xxxxxxxxxxx https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev