Re: [PATCH 2/2] http-backend: spool ref negotiation requests to buffer

Jeff King <peff@xxxxxxxx> · Fri, 15 May 2015 15:16:46 -0400

On Fri, May 15, 2015 at 11:22:42AM -0700, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > The solution is fairly straight-forward: we read the request
> > body into an in-memory buffer in http-backend, freeing up
> > Apache, and then feed the data ourselves to upload-pack. But
> > there are a few important things to note:
> >
> >   1. We limit in-memory buffer to no larger than 1 megabyte
> >      to prevent an obvious denial-of-service attack. This
> >      is a new hard limit on requests, but it's likely that
> >      requests of this size didn't work before at all (i.e.,
> >      they would have run into the pipe buffer thing and
> >      deadlocked).

So this 1MB limit is clearly a problem, and the reasoning above is not
right. The case we are helping is when a large amount of input creates a
large amount of output. But we're _hurting_ the case where there's just
a large amount of input (as shown by the Dennis's test case).

What do we want to do about that? We can switch to streaming after
hitting our limit (so opening the opportunity for deadlock again in some
cases, but making sure we do no harm to cases that currently work). Or
we can just bump the input size and say "you'd be crazy to send more
than 10MB" (or 50, or whatever). We could make a configuration knob,
too, I guess.

> One unrelated thing I noticed was that three codepaths independently
> have close(0) in run_service() now, and made me follow the two
> helper functions to see they both do the close at the end.  It might
> have made the flow easier to follow if run_service() were
> 
>     ...
>     close(1);
>     if (gzip)
>     	inflate();
>     else if (buffer)
>         copy();
>     close(0);
>     ...
> 
> But that is minor.

I don't see the close(0) in the other (buffered) code paths. We close
the _output_ to the child, but of course we have to do that to tell it
we're done sending (I actually forgot it in an earlier version of
copy_request(), and things hang :) ).

I don't think there's any need to close(0) in the buffered cases. We
read until EOF in the copy() case. For gzip, we read until the end of
the gzipped data. I guess it would be better to close if we're not
expecting more input, as otherwise Apache might block trying to write to
us if the client sends bogus input (i.e., a zlib stream with more cruft
at the end).

> Also, is it worth allocating small and then growing up to the maximum?
> I think this only relays one request at a time anyway, and I suspect
> that a single 1MB allocation at the first call kept getting reused
> may be sufficient (and much simpler).

My initial attempt did exactly that, but I had a much smaller buffer. I
started to get worried around 1MB. If we bump it to 10MB (or make it
configurable), I get more so. I dunno. It is not _that_ much memory, but
it is per-request we are serving, so it might add up on a busy server.
OTOH, pack-objects thinks nothing of allocating 800MB just for the
book-keeping to serve a clone of torvalds/linux.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html