Re: [PATCH v2 3/3] http: automatically retry some requests

Jeff King <peff@xxxxxxxx> · Wed, 14 Oct 2020 11:17:14 -0400

On Tue, Oct 13, 2020 at 11:45:02PM +0000, brian m. carlson wrote:

> Yeah, I was about to mention the same thing.  It looks like we cover
> only a subset of requests.  Moreover, I think this feature is going to
> practically fail in some cases and we need to either document that
> clearly or abandon this effort.
> 
> In remote-curl.c, we have post_rpc, which does a POST request to upload
> data for a push.  However, if the data is larger than the buffer, we
> stream it using chunked transfer-encoding.  Because we're reading from a
> pipe, that data cannot be retried: the pack-objects stream will have
> ended.

Right, this is the large_request code path there. We use the same
function for fetches, too, though perhaps it's less likely that a
negotiation request will exceed the post buffer size.

We do send a probe rpc in this case, which lets us handle an HTTP 401
auth request. We _could_ retry on errors to the probe rpc, but I'm not
sure if it really accomplishes that much. The interesting thing is
whether the actual request with content goes through. If retrying
magically fixes things, there's no reason to think that the actual
request is any less likely to intermittently fail than the probe request
(in fact I'd expect it to fail more).

It would be possible to restart even these large requests. Obviously we
could spool the contents to disk in order to replay it. But that carries
a cost for the common case that we either succeed or definitely fail on
the first case, and never use the spooled copy.

Another option is to simply restart the Git process that is generating
the data that we're piping. But that has to happen outside of
post_rpc(); only the caller knows the right way to invoke that again.
And we kind of already have that: just run the Git command again.
I know that sounds a little dismissive, but it has really been our
recommended retry strategy for ages[1].

So I'd wonder in Sean's use case why just restarting the whole Git
process isn't a workable solution. It wouldn't respect Retry-After, but
it might be reasonable to surface that header's value to the caller so
it can act appropriately (and I guess likewise whether we saw an error
that implies retrying might work).

All of this is written from the perspective of v1. In v2, we do a lot
more blind packet-shuffling (see remote-curl.c:stateless_connect()). I
suspect it makes any kind of retry at the level of the http code much
harder. Whereas just restarting the Git command would probably work just
fine.

-Peff

[1] I think git-submodule will retry failed clones, for example. TBH, I
    have never once seen this retry accomplish anything, and it only
    wastes time and makes the output more confusing (since we see the
    failure twice). I have to admit I'm not thrilled to see more blind
    retrying for that reason.