Hi Peff, On Sat, Mar 28, 2020 at 11:49:36AM -0400, Jeff King wrote: > On Sat, Mar 28, 2020 at 11:13:00AM -0400, Jeff King wrote: > > > On Sat, Mar 28, 2020 at 10:57:42AM -0400, Jeff King wrote: > > > > > But since it works for v1 (which also dies!), and since you were able to > > > reproduce the problem locally, I suspect it may be an issue in > > > upload-pack itself. > > > > Actually, I think the problem is on the client side. > > > > If I run your test without the http-backend change, then nothing is left > > running on the server side, but on the client we have two processes: > > git-clone and the git-remote-https helper. And they're deadlocked trying > > to read from each other. > > > > I think the issue is that git-remote-https in v2 mode calls into > > stateless_connect(), and just pumps http request/response pairs from > > git-clone to the server. But if a request generates an empty response, > > then clone has no idea that anything was received at all. It continues > > waiting, but remote-https has looped to expect another request from it. > > > > Your patch fixes _this case_ because it causes the server to send a > > non-empty response. But the client can't rely on that. Besides that not > > being a documented server behavior, in the worst case the connection > > could be severed mid-stream. So I think remote-https needs to complain > > about an empty response. This isn't a problem in v1 because it would > > actually try to look at that empty response; in v2 it's just proxying > > bytes around. > > Ugh, it's actually much worse than this. We dealt with the > empty-response case in 296b847c0d (remote-curl: don't hang when a server > dies before any output, 2016-11-18). The issue here is that we got a > _partial_ response, and clone is waiting for the rest of it. > > The server said "0011shallow-info\n" before it bailed. So from the > perspective of git-clone, it's now waiting to read through a flush > packet. But remote-https has nothing left to send. The root of the issue > is that it has no way to signal to clone that it has already sent > everything the server gave it. In non-helper code, clone would see the > EOF. And in v1 https, I think there's some extra framing between > remote-https and "fetch-pack --stateless-rpc" so that EOF effectively > gets passed along. But v2's stateless_connect() strategy is > fundamentally missing this signal, and there are probably other spots in > the protocol that could cause similar deadlocks. > > I wonder if remote-https's stateless_connect() could complain if there's > no flush packet in the output it writes back. That would certainly fix > this case, but I'd worry there are rpc messages that don't end in a > flush. And it would be susceptible to data cut-offs that happened to > look like a flush packet. > > I think the robust solution is to introduce an extra level of pktline > framing into the remote-helper stateless-connect protocol. Thanks for doing some more investigation and correcting my faulty analysis. I'll use this information to try and create a new patch. -Denton > -Peff