Hello, I'm trying to multiplex many simultaneous SSH connections through a single master connection, and I'm hitting a race condition while doing this. This is not a bug; I'm either hitting a limit in the design of OpenSSH or misusing it. The use-case is to use Ansible to configure many hosts simultaneously, while all connections need to go through a single "SSH bastion" via ProxyJump. For efficiency and to avoid hitting MaxStartups limits, I would like to use a control master for the connection to the bastion, via the following client configuration: Host bastion.example.com ControlMaster auto ControlPath /dev/shm/ssh-%h ControlPersist 30 Host !bastion.example.com *.example.com ProxyJump bastion.example.com However, this does not work when making simultaneous connections: all SSH connections create a new, separate connection to the bastion. Here is a simple way to reproduce: $ for i in {1..3}; do ssh myhost.example.com "sleep 1" & done ControlSocket /dev/shm/ssh-bastion.example.com already exists, disabling multiplexing ControlSocket /dev/shm/ssh-bastion.example.com already exists, disabling multiplexing What happens is the following: 1) each SSH process tries to connect to the control socket and fails (this is expected, the control socket is not yet bound) 2) each SSH process then creates a new SSH connection 3) once connected, each process tries to bind to the control socket 4a) one process successfully binds the control socket 4b) all other processes fail to bind the control socket (error message above) 5) in both cases, each process is now using its own separate SSH connection to the bastion The window for the race condition is between 1) and 4), so it's rather large: it includes the time to establish a new SSH connection. I believe that taking a lock between steps 1) and 4) could solve the issue: 1.1) each process tries to take an exclusive lock related to the control socket 1.1a) one process gets the lock and can continue creating a SSH connection 1.1b) all other processes wait on the lock; when the lock is released, they go back to step 1) to connect to the control socket 4.1) once the control socket has been bound, the "lucky process" releases the lock Does it make sense? Would the project accept a patch implementing this as an additional option? Thanks, Baptiste -- Baptiste Jonglez Research Engineer, Inria <https://www.inria.fr/> STACK team <https://stack-research-group.gitlabpages.inria.fr/web/> _______________________________________________ openssh-unix-dev mailing list openssh-unix-dev@xxxxxxxxxxx https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev