RE: [PATCH v3] http.postbuffer: allow full range of ssize_t values

David Turner <David.Turner@xxxxxxxxxxxx> · Tue, 4 Apr 2017 18:42:23 +0000

> -----Original Message-----
> From: Jeff King [mailto:peff@xxxxxxxx]
> Sent: Monday, April 3, 2017 10:02 PM
> To: David Turner <David.Turner@xxxxxxxxxxxx>
> Cc: git@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v3] http.postbuffer: allow full range of ssize_t values
> 
> On Mon, Apr 03, 2017 at 05:03:49PM +0000, David Turner wrote:
> 
> > > > Unfortunately, in order to push some large repos, the http
> > > > postbuffer must sometimes exceed two gigabytes.  On a 64-bit system,
> this is OK:
> > > > we just malloc a larger buffer.
> > >
> > > I'm still not sure why a 2GB post-buffer is necessary. It sounds
> > > like something is broken in your setup. Large pushes should be sent
> chunked.
> > >
> > > I know broken setups are a fact of life, but this feels like a
> > > really hacky work- around.
> >
> > I'm not sure what other workaround I should use.  I guess I could do
> > multiple pushes, but only if individual objects are under the size
> > limit, and I'm not sure all of mine are (maybe I'll get lucky tho).  I
> > know that this is a configuration issue with gitlab:
> > https://gitlab.com/gitlab-org/gitlab-ce/issues/30315 but I don't know
> > when that will get fixed.  I could manually copy the repo to the
> > server and do a local push, but I don't know that I have the necessary
> > permissions to do that. Or I could do this, which would hopefully
> > actually solve the problem.
> 
> I didn't think we had gotten details on what was actually broken. Is it really
> that GitLab does not support chunked transfers? I know that's what the issue
> above says, but it sounds like it is just assuming that is the problem based on
> the recent messages to the list.

I agree that we don't know for sure what's actually broken.  I think probably the
GitLab bug tracker might be the better place to figure that out, unless GitLab 
thinks they're hitting a bug in git.

> If that's really the issue, then OK. That's lame, but something the client has to
> work around. It seems like a pretty big gap, though (and one I'd think people
> would have hit before; the default post-buffer is only 1MB. Surely people
> routinely push more than that to GitLab servers?
> So I'm really wondering if there is something else going on here.
> 
> What does it look like when it fails? What does GIT_TRACE_CURL look like (or
> GIT_CURL_VERBOSE if your client is older, but remember to sanitize any auth
> lines)?

Unfortunately, we've already worked around the problem by pushing over SSH, 
so I no longer have a failing case to examine. Last time I tried, I actually did some 
hackery to create a push smaller than 2GB, but it still failed (this time, with 
"502 Bad Gateway").  So, something is clearly weird in GitLab land.

I did see "Transfer-Encoding: chunked" in one of the responses from the server,
 but not in the request (not sure if that's normal). The smaller push had: 
Content-Length: 1048908476

(For me to publish longer log traces requires a level of security review which is 
probably too much of a hassle unless you think it will be really useful).

> > > The ultimate fate of this number, though, is to be handed to:
> > >
> > >   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, rpc->len);
> > >
> > > where the final argument is interpreted as a long. So I suspect that
> > > on 64-bit Windows, setting http.postbuffer to "3G" would cause some
> > > kind of weird error (either a truncated post or some internal curl
> > > error due to the negative size, depending on how curl handles it).
> >
> > Ah, so we would need to use CURLOPT_POSTFIELDSIZE_LARGE.  Will re-roll.
> 
> Ah, neat. I didn't even know about CURLOPT_POSTFIELDSIZE_LARGE, and
> thought we'd have to just limit 32-bit platforms. That's a much better
> solution.
> 
> > > I saw the earlier iteration used a size_t, but you switched it after
> > > the compiler
> > > (rightfully) complained about the signedness. But I'm not sure why
> > > we would want ssize_t here instead of just using git_parse_unsigned().
> >
> > It was originally signed.  I'm not sure why that was, but I figured it
> > would be simpler to save the extra bit just in case.
> 
> I think it was simply because git_config_int() is the generic "number"
> parser, and nobody ever thought about it.
> 
> In fact, passing a negative value to curl would be disastrous, as it would use
> strlen(). I don't think a negative value could ever get that far, though. It looks
> like the config code would silently turn a negative value into
> LARGE_PACKET_MAX.

I would still prefer to preserve the bit just in case we ever decide that a negative 
value should have some special meaning later.  Of course, that special meaning 
wouldn't be "pass directly to curl".  (As I think about git's http protocol, and how 
hard it is to change it, I always want to leave the maximal number of extra bits 
free possible for general future usage).

> IMHO, complaining about the negative number to the user would be an
> improvement.

That seems reasonable.