Re: [PATCH] http-backend: allow empty CONTENT_LENGTH

Jeff King <peff@xxxxxxxx> · Thu, 6 Sep 2018 23:38:31 -0400

On Fri, Sep 07, 2018 at 06:27:40AM +0300, Max Kirillov wrote:

> On Thu, Sep 06, 2018 at 02:54:18PM -0700, Junio C Hamano wrote:
> > Max Kirillov <max@xxxxxxxxxx> writes:
> >> This should fix it. I'm not sure should it treat it as 0 or "-1"
> >> At least the tests mentioned by Jeff fails if I try to treat missing CONTENT_LENGTH as "-1"
> >> So keep the existing behavior as much as possible
> > 
> > I am not sure what you mean by the above, between 0 and -1.  The
> > code signals the caller of get_content_length() that req_len is -1
> > which is used as a sign to read through to the EOF, so it appears to
> > me that the code treats missing content-length (i.e. str == NULL
> > case) as "-1".
> 
> I made a mistake in this, it should be "if I try to treat missing
> CONTENT_LENGTH as 0". This, as far as I understand, what the
> RFC specifies.
> 
> That is, after the following change, the test "large fetch-pack
> requests can be split across POSTs" from t5551 starts faliing:
> 
> -- >8 --
> @@ -353,8 +353,12 @@ static ssize_t get_content_length(void)
>         ssize_t val = -1;
>         const char *str = getenv("CONTENT_LENGTH");
>  
> -       if (str && *str && !git_parse_ssize_t(str, &val))
> -               die("failed to parse CONTENT_LENGTH: %s", str);
> +       if (str && *str) {
> +               if (!git_parse_ssize_t(str, &val))
> +                       die("failed to parse CONTENT_LENGTH: %s", str);
> +       } else
> +               val = 0;
> +

Right, I'm pretty sure it is a problem if you treat a missing
CONTENT_LENGTH as "present, but zero". Because chunked encodings from
apache really do want us to read until EOF.

My understanding from Jelmer's report is that a present-but-empty
variable should be counted as "0" to mean "do not read any body bytes".
That matches my reading of RFC 3875, which says:

  If no data is attached, then NULL (or unset).

(and earlier they explicitly define NULL as the empty string). That
said, we do not do what they say for the "unset" case. And cannot
without breaking chunked encoding from apache. So I don't know how much
we want to follow that rfc to the letter, but at least it makes sense to
me to revert this case back to what Git used to do, and what the rfc
says.

In other words, I think the logic we want is:

  if (!str) {
	/*
	 * RFC3875 says this must mean "no body", but in practice we
	 * receive chunked encodings with no CONTENT_LENGTH. Tell the
	 * caller to read until EOF.
	 */
	val = -1;
  } else if (!*str) {
	/*
	 * An empty length should be treated as "no body" according to
	 * RFC3875, and this seems to hold in practice.
	 */
	val = 0;
  } else {
	/*
	 * We have a CONTENT_LENGTH; trust what's in it as long as it
	 * can be parsed.
	 */
	if (!git_parse_ssize_t(str, &val))
	        die(...);
  }

-Peff