On Sat, Mar 28, 2020 at 11:13:00AM -0400, Jeff King wrote: > On Sat, Mar 28, 2020 at 10:57:42AM -0400, Jeff King wrote: > > > But since it works for v1 (which also dies!), and since you were able to > > reproduce the problem locally, I suspect it may be an issue in > > upload-pack itself. > > Actually, I think the problem is on the client side. > > If I run your test without the http-backend change, then nothing is left > running on the server side, but on the client we have two processes: > git-clone and the git-remote-https helper. And they're deadlocked trying > to read from each other. > > I think the issue is that git-remote-https in v2 mode calls into > stateless_connect(), and just pumps http request/response pairs from > git-clone to the server. But if a request generates an empty response, > then clone has no idea that anything was received at all. It continues > waiting, but remote-https has looped to expect another request from it. > > Your patch fixes _this case_ because it causes the server to send a > non-empty response. But the client can't rely on that. Besides that not > being a documented server behavior, in the worst case the connection > could be severed mid-stream. So I think remote-https needs to complain > about an empty response. This isn't a problem in v1 because it would > actually try to look at that empty response; in v2 it's just proxying > bytes around. Ugh, it's actually much worse than this. We dealt with the empty-response case in 296b847c0d (remote-curl: don't hang when a server dies before any output, 2016-11-18). The issue here is that we got a _partial_ response, and clone is waiting for the rest of it. The server said "0011shallow-info\n" before it bailed. So from the perspective of git-clone, it's now waiting to read through a flush packet. But remote-https has nothing left to send. The root of the issue is that it has no way to signal to clone that it has already sent everything the server gave it. In non-helper code, clone would see the EOF. And in v1 https, I think there's some extra framing between remote-https and "fetch-pack --stateless-rpc" so that EOF effectively gets passed along. But v2's stateless_connect() strategy is fundamentally missing this signal, and there are probably other spots in the protocol that could cause similar deadlocks. I wonder if remote-https's stateless_connect() could complain if there's no flush packet in the output it writes back. That would certainly fix this case, but I'd worry there are rpc messages that don't end in a flush. And it would be susceptible to data cut-offs that happened to look like a flush packet. I think the robust solution is to introduce an extra level of pktline framing into the remote-helper stateless-connect protocol. -Peff