Re: Temporary Crypto Glitches ... ??

Konrad Bucheli <kb@xxxxxxx> · Thu, 11 Nov 2021 12:49:55 +0100

Hi Jochen

We run a few thousands of hosts with varying quality of internet lines.
It is a fallback procedure to try to only use ed25519 crypto if the 
connection fails half-way through. The reason is that it needs only 
smaller packets which can help if there there is (more) trouble with 
bigger network packets.

Cheers

Konrad

On 09.11.21 17:35, Jochen Bern wrote:
This has got to be one of the weirdest problem descriptions I've ever 
dared publish ...

Yesterday evening, I had problems SSHing from a jump host through an 
IPsec VPN to a couple customer servers (everything running CentOS 7). I 
was able to work around the problem by fiddling with the crypto 
settings; some more details below.

This morning, those connections were back to normal, but the supporter 
on duty reported that he could not SSH into an entirely different server 
(also CentOS 7, and straight from his workplace machine); that problem 
fixed itself a couple hours later, too.

Is this just the spookiest coincidence since last Halloween, or did we 
chance onto a rare, time-triggered malfunction somewhere in the 
OpenSSH(/OpenSSL?) crypto ... ?

-------

Alas, the supporter isn't up to SSH connection debugging, so he never 
did a -vv and couldn't tell any symptom beyond "it times out". I failed 
to save my -vv's output, but I remember that roughly where you'd 
normally get to the KEXINIT, my client claimed to be waiting for some 
ECDH - and then just sat until the timeout.

I usually have two keypairs - one ed25519, one RSA - loaded into my 
agent, and now that things are back to normal, the Kex chosen is 
curve25519-sha256@xxxxxxxxxx. In order to circumvent the problem, I had 
to remove my RSA keypair from the agent and use

$ ssh -o "KexAlgorithms diffie-hellman-group-exchange-sha256" $SERVER

to get logged in.

I started haveged on "my" target machines, but 
/proc/sys/kernel/random/entropy_avail reported > 3kbit anyway and my 
colleague's remote system had haveged running already, so I doubt that 
that actually did anything.

Our monitoring and automated data fetchers apparently never saw any 
problem to SSH into those servers - using RSA keypairs. The server, set 
to LogLevel VERBOSE and typically logging

Connection from $CLIENT_IP ...
Postponed publickey for $LOCAL_USER ...

at the beginning of a connection, never wrote the second line for the 
failed attempts. (With all our accesses getting SNATed, I'm not sure yet 
whether there are any dangling instances of the *first* line.)

Nothing in hosts.allow/hosts.deny, and DNS lookups of the client IP 
garner an NXDOMAIN normally.

Thanks for any pointers,

_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@xxxxxxxxxxx
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev?mc_phishing_protection_id=45427-c65ab1euab2puk9vp28g

_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@xxxxxxxxxxx
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev