On Tue, May 18, 2021 at 11:46:17PM -0400, Greg Pflaum wrote: > Git's handling of the SSH session during "git clone" changed between Git > 2.17.0 and 2.31.1, causing cloning of a large repo to fail when the server > closes the idle session during the "Resolving deltas" phase. Interesting find. During that phase, all communication with the server is finished. We're not expecting it to say anything else, and I'd have actually expected us to have hung up the connection. But I can definitely reproduce the issue by fetching even a moderate sized repository like git.git, and observing that ssh is still running when we hit the "Resolving deltas" phase. And indeed, killing it at that step shows the problem: $ git clone --bare git@xxxxxxxxxx:git/git foo.git Cloning into bare repository 'foo.git'... remote: Enumerating objects: 307777, done. remote: Counting objects: 100% (114/114), done. remote: Compressing objects: 100% (48/48), done. remote: Total 307777 (delta 72), reused 94 (delta 66), pack-reused 307663 Receiving objects: 100% (307777/307777), 159.14 MiB | 30.71 MiB/s, done. Resolving deltas: 100% (229729/229729), done. error: ssh died of signal 9 In another terminal I waited for it to hit "resolving" and ran: kill -9 $(ps ax | grep ssh | grep github | awk '{print $1}') You can see that we did completely successfully receive the incoming pack. It's just that we then do an over-eager check of ssh's exit code and complain when disconnecting the transport. I had a hunch that this was related to the v2 protocol (which became the default between the two versions you mentioned). And indeed, running "git clone -c protocol.version=0 clone ..." makes it go away. So I'd guess that in the v0 protocol, we close the pipe going to ssh's stdin (which does a half-duplex shutdown, and then when the server side closes its pipe, ssh exits completely). But in v2, we presumably don't. Which is not too surprising; v2's view of ssh is much more as a transport over which it will make several request/response pairs. So the caller would have to explicitly indicate that this is the final request, and the transport can be terminated after that. That doesn't seem too complex conceptually, but I worry implementing it will run into conflicts with how the v2 code works. Another side issue is that once the protocol conversation has finished, I'm not sure if it's really useful for us to detect and complain about ssh's exit code. We know the other side completed the conversation successfully, and we have nothing left to ask it. So a fix for your immediate pain would be to stop noticing that. I think the root issue is still worth addressing, though; we are tying up network and local resources with a useless to-be-closed ssh connection. -Peff