Re: Race condition when using ControlMaster=auto with simultaneous connections

Demi Marie Obenour <demiobenour@xxxxxxxxx> · Wed, 31 Aug 2022 23:37:12 -0400



On 8/31/22 09:24, Baptiste Jonglez wrote:
> Hello,
> 
> I'm trying to multiplex many simultaneous SSH connections through a single
> master connection, and I'm hitting a race condition while doing this.
> This is not a bug; I'm either hitting a limit in the design of OpenSSH or
> misusing it.
> 
> The use-case is to use Ansible to configure many hosts simultaneously,
> while all connections need to go through a single "SSH bastion" via ProxyJump.
> For efficiency and to avoid hitting MaxStartups limits, I would like to
> use a control master for the connection to the bastion, via the following
> client configuration:
> 
>     Host bastion.example.com
>       ControlMaster auto
>       ControlPath /dev/shm/ssh-%h
>       ControlPersist 30
> 
>     Host !bastion.example.com *.example.com
>       ProxyJump bastion.example.com
> 
> However, this does not work when making simultaneous connections: all SSH
> connections create a new, separate connection to the bastion.  Here is a
> simple way to reproduce:
> 
>     $ for i in {1..3}; do ssh myhost.example.com "sleep 1" & done
>     ControlSocket /dev/shm/ssh-bastion.example.com already exists, disabling multiplexing
>     ControlSocket /dev/shm/ssh-bastion.example.com already exists, disabling multiplexing
> 
> What happens is the following:
> 
> 1) each SSH process tries to connect to the control socket and fails
>    (this is expected, the control socket is not yet bound)
> 
> 2) each SSH process then creates a new SSH connection
> 
> 3) once connected, each process tries to bind to the control socket
> 
> 4a) one process successfully binds the control socket
> 4b) all other processes fail to bind the control socket (error message above)
> 
> 5) in both cases, each process is now using its own separate SSH connection to the bastion
> 
> The window for the race condition is between 1) and 4), so it's rather
> large: it includes the time to establish a new SSH connection.
> 
> I believe that taking a lock between steps 1) and 4) could solve the issue:
> 
> 1.1) each process tries to take an exclusive lock related to the control socket
> 1.1a) one process gets the lock and can continue creating a SSH connection
> 1.1b) all other processes wait on the lock; when the lock is released, they
>       go back to step 1) to connect to the control socket
> 
> 4.1) once the control socket has been bound, the "lucky process" releases the lock
> 
> Does it make sense?  Would the project accept a patch implementing this as
> an additional option?

Not sure if this is related, but I would like to have an option to *only* use the
control socket.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Attachment:
OpenPGP_0xB288B55FFF9C22C1.asc

Description: OpenPGP public key
Attachment:
OpenPGP_signature

Description: OpenPGP digital signature
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@xxxxxxxxxxx
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev