I have an NXP i.MX6-based armv7l-dey-linux-gnueabihf system in which I am seeing some as-yet-unaccountable behavior in sshd when compiled with Arm/GCC 10/11/12. That is, when attempting to scp/slogin/ssh to 'root@<host>', where <host> is either a name or IPv4 or IPv6 address, the connection is quickly closed by the server without prompting for a password. The variable I can consistently change across all others to impact whether things work or do not work is the toolchain. Under the arm-dey-linux-gnueabi-gcc 8.2.0 from Digi Embedded Yocto (DEY), scp/slogin/ssh works. Under arm-none-linux-gnueabihf-gcc 10/11/12 (specifically those from https://developer.arm.com/-/media/Files/downloads/gnu-a/10.3-2021.07/binrel/gcc-arm-10.3-2021.07-x86_64-arm-none-linux-gnueabihf.tar.xz, https://developer.arm.com/-/media/Files/downloads/gnu/11.3.rel1/binrel/arm-gnu-toolchain-11.3.rel1-x86_64-arm-none-linux-gnueabihf.tar.xz, and https://developer.arm.com/-/media/Files/downloads/gnu/12.3.rel1/binrel/arm-gnu-toolchain-12.3.rel1-x86_64-arm-none-linux-gnueabihf.tar.xz) they do not, failing consistently and with the same failure across the three of them. The original version of openssh under which this was observed was 9.3p1, configured as follows: ${BuildRoot}/third_party/openssh/openssh-9.3p1/configure -C \ AR="${AR}" CPP="${CPP}" CC="${CC}" CXX="${CXX}" RANLIB="${RANLIB}" STRIP="${STRIP}" \ CPPFLAGS="--sysroot=${SYSROOT} -mcpu=cortex-a8 -mfloat-abi=hard -mfpu=neon -isystem ${SYSROOT}/usr/include -I${BuildRoot}/results/${PRODUCT}/arm/gnu-toolchain/12.3.1/release/third_party/ncurses/usr/include -I${BuildRoot}/results/${PRODUCT}/arm/gnu-toolchain/12.3.1/release/third_party/openssl/usr/include" \ CFLAGS="--sysroot=${SYSROOT} -mcpu=cortex-a8 -mfloat-abi=hard -mfpu=neon -fno-omit-frame-pointer -fno-strict-aliasing" \ LDFLAGS="--sysroot=${SYSROOT} -L${BuildRoot}/results/${PRODUCT}/arm/gnu-toolchain/12.3.1/release/third_party/ncurses/usr/lib/ -L${BuildRoot}/results/${PRODUCT}/arm/gnu-toolchain/12.3.1/release/third_party/libedit/usr/lib/ -Wl,-rpath-link -Wl,${BuildRoot}/results/${PRODUCT}/arm/gnu-toolchain/12.3.1/release/third_party/ncurses/usr/lib -Wl,-rpath-link -Wl,${BuildRoot}/results/${PRODUCT}/arm/gnu-toolchain/12.3.1/release/third_party/zlib/usr/lib" \ --build=x86_64-pc-linux-gnu \ --host=arm-dey-linux-gnueabi \ --target=arm-dey-linux-gnueabi \ --disable-strip \ --with-hardening \ --with-libedit="${BuildRoot}/results/${PRODUCT}/arm/gnu-toolchain/12.3.1/release/third_party/libedit/usr" \ --with-mantype=cat \ --with-openssl \ --with-pid-dir=/var/run \ --with-privsep-path=/var/run/sshd \ --with-ssl-dir="${BuildRoot}/results/${PRODUCT}/arm/gnu-toolchain/12.3.1/release/third_party/openssl/usr" \ --with-stackprotect \ --with-zlib-version-check \ --with-zlib="${BuildRoot}/results/${PRODUCT}/arm/gnu-toolchain/12.3.1/release/third_party/zlib/usr" \ --without-kerberos5 \ --without-ldns \ --without-maildir \ --without-pam \ --without-rpath \ --without-selinux \ --without-xauth \ --prefix=/usr \ --sysconfdir=/etc/ssh \ --localstatedir=/var Were it just one version, I’d have expected a potential code generation bug with the compiler; however, across three different versions from three different GCC eras, I’m inclined to believe this isn’t a code- generation issue. In all failures, the ssh client fails with: debug1: expecting SSH2_MSG_KEX_ECDH_REPLY followed by: Connection closed by <IP address of server> port 22 In all failures, the ssh daemon fails with: debug1: expecting SSH2_MSG_KEX_ECDH_INIT [preauth] debug3: receive packet: type 30 [preauth] debug3: mm_sshkey_sign entering [preauth] debug3: mm_request_send entering: type 6 [preauth] debug3: mm_sshkey_sign: waiting for MONITOR_ANS_SIGN [preauth] debug3: mm_request_receive_expect entering: type 7 [preauth] debug3: mm_request_receive entering [preauth] debug3: mm_request_receive entering debug3: monitor_read: checking request 6 debug3: mm_answer_sign debug3: mm_answer_sign: hostkey proof signature 0x1164880(100) debug3: mm_request_send entering: type 7 debug2: monitor_read: 6 used once, disabling now debug3: send packet: type 31 [preauth] debug3: send packet: type 21 [preauth] debug2: set_newkeys: mode 1 [preauth] debug1: rekey after 134217728 blocks [preauth] debug1: monitor_read_log: child log fd closed debug3: mm_request_receive entering debug1: do_cleanup debug1: Killing privsep child 2544 My first inclination was that this was a SHA-1 key algorithm deprecation issue; however, I verified that was not the case. And, again, the fact that the compiler is the only variable indicated it likely was not. My second inclination was that this was perhaps an optimization issue with the later versions of GCC, so I compiled OpenSSH with -O0. No change. Digi DEY 8.2.0 works; Arm GNU Toolchain 10/11/12 did not. My next inclination was to try a different ssh client. I’d been using 8.2p1 (Ubuntu 20.04); however, 8.1p1 and 8.6p1 (macOS) as well as a locally-built 9.5p1 yielded the same results: Digi DEY 8.2.0 works; Arm GNU Toolchain 10/11/12 did not. My next inclination was to iterate through sshd_config configuration. I commented out the 10 lines one-by-one and retested which yielded the same results: Digi DEY 8.2.0 works; Arm GNU Toolchain 10/11/12 did not. My next inclination was that perhaps OpenSSL was creating an issue. I tried 1.1.1w (up from my 1.1.1s) and 3.1.4 which yielded the same results: Digi DEY 8.2.0 works; Arm GNU Toolchain 10/11/12 did not. My next inclination was the perhaps it was OpenSSH version-specific. I tried up revving to 9.5p1 and then down revving to 7.9p1 which yielded the same results: Digi DEY 8.2.0 works; Arm GNU Toolchain 10/11/12 did not. My last inclination was to do a side-by-side comparison of the configuration and compilation output between Digi DEY 8.2.0 and Arm GNU Toolchain 12. The key differences were checking: if ${CC} supports compile flag -fzero-call-used-regs=all if ${CC} supports compile flag -ftrivial-auto-var-init=zero for sys/sysctl.h for library containing login for closefrom for close_range for library containing dlopen for arc4random for arc4random_buf for arc4random_uniform if libc defines sys_errlist if libc defines sys_nerr for library containing res_query for library containing dn_expand if res_query will link for _getshort for _getlong While most of these configuration difference seem trivial and innocuous, the -fzero-call-used-regs=all and -ftrivial-auto-var-init=zero compiler language / code generation options seemed the most likely among those differences to impact the point at which the client/daemon interaction seemed to be failing. So, I forcibly disabled both which yielded the same results: Digi DEY 8.2.0 works; Arm GNU Toolchain 10/11/12 did not. Does anyone recognize this as a familiar failure mode? Beyond that, any thoughts or recommendations on zeroing in further on the potential root cause? Best, Grant _______________________________________________ openssh-unix-dev mailing list openssh-unix-dev@xxxxxxxxxxx https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev