Re: [PATCH v6 1/2] http-backend: respect CONTENT_LENGTH as specified by rfc3875

Jeff King <peff@xxxxxxxx> · Tue, 28 Nov 2017 22:22:14 -0500

On Sun, Nov 26, 2017 at 09:38:12PM +0200, Max Kirillov wrote:

> From: Florian Manschwetus <manschwetus@xxxxxxxxxxxxxxxxxxx>
> Date: Wed, 30 Mar 2016 10:54:21 +0200
> 
> http-backend reads whole input until EOF. However, the RFC 3875 specifies
> that a script must read only as many bytes as specified by CONTENT_LENGTH
> environment variable. Web server may exercise the specification by not closing
> the script's standard input after writing content. In that case http-backend
> would hang waiting for the input. The issue is known to happen with
> IIS/Windows, for example.
> 
> Make http-backend read only CONTENT_LENGTH bytes, if it's defined, rather than
> the whole input until EOF. If the variable is not defined, keep older behavior
> of reading until EOF because it is used to support chunked transfer-encoding.

I missed out on review of the earlier iterations from the past few days,
but I looked this over in light of the comments I made long ago in:

  https://public-inbox.org/git/20160329201349.GB9527@xxxxxxxxxxxxxxxxxxxxx/

The concerns I had there were:

  1. It did the wrong thing when CONTENT_LENGTH was not present in the
     environment. Your version here gets this right.

  2. I earlier was worried that this wouldn't kick in for the
     inflate_request() code path. It looks like we do use read_request()
     from inflate_request(), but only when buffer_input is true. Which
     means we wouldn't do so for receive-pack (i.e., for pushes).

     I'm not sure if the client would ever gzip a push request. Without
     double-checking, I suspect it would if the list of refs to update
     was sufficiently large (and at any rate, I think other clients
     potentially could do so).

     That _might_ be OK in practice. If the gzip stream is well-formed,
     we'd stop at its end. We'd possibly still try to read() more bytes
     than were promised, but if I understand the original problem,
     that's not that big a deal as long as we don't hang waiting for
     EOF.

     If the stream isn't well-formed, we'd hang waiting for more bytes
     (rather than seeing the EOF and complaining that the gzip stream is
     truncated). That's poor, but no worse than the current behavior.

  3. For large inputs (like incoming packfiles), we connect the
     descriptor directly to index-pack or unpack-objects, and they try
     to read to EOF.

     For a well-formed pack, I _think_ this would work OK. We'd see the
     end of the pack and quit (there's a check for garbage at the end of
     the pack, but it triggers only for the non-pipe case).

     For a truncated input, we'd hang forever rather than report an
     error.

So I suspect there are lurking problems that may trigger in corner
cases. That said, I don't think this pack makes any case _worse_, and it
may make some common ones better. So I'm not opposed, though we may be
giving people a false sense of security that it actually works robustly
on IIS.

> ---
>  config.c       |  2 +-
>  config.h       |  1 +
>  http-backend.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 51 insertions(+), 2 deletions(-)

The patch itself looks OK, modulo the comments already made by others.
Two minor observations, and a possible bug:

> +static ssize_t read_request_fixed_len(int fd, ssize_t req_len, unsigned char **out)
> +{

I did wonder if this "stop at n bytes" could simply be rolled into the
existing read_request loop (by limiting the length we pass to
read_in_full there). But it may be cleaner to just have a separate
function. There's some repetition, but not much since we can rely on a
single malloc and read_in_full() for this case.

> +	unsigned char *buf = NULL;
> +	ssize_t cnt = 0;
> +
> +	if (max_request_buffer < req_len) {
> +		die("request was larger than our maximum size (%lu): %lu;"
> +			    " try setting GIT_HTTP_MAX_REQUEST_BUFFER",
> +			    max_request_buffer,
> +			    req_len);
> +	}
> +
> +	if (req_len <= 0) {
> +		*out = NULL;
> +		return 0;
> +	}

I was slightly surprised by "<= 0" here. We should never get here with a
negative req_len, since we'd catch that in the read_request() wrapper
here. If we want to document that assumption, should this be
assert(req_len >= 0)?

I'm also puzzled about the behavior with a zero-byte CONTENT_LENGTH.
We'd return NULL here. But in the other read_request_eof() path, we'd
end up with a non-NULL pointer. I'm not sure if it matters to the caller
or not, but it seems like a potential trap.

Is it even worth special-casing here? Our xmalloc and read_in_full
wrappers should handle the zero-byte just fine. I think this whole
conditional could just go away.

-Peff