Re: [PATCH] Verify Content-Type from smart HTTP servers

Jeff King <peff@xxxxxxxx> · Fri, 1 Feb 2013 03:52:49 -0500

On Thu, Jan 31, 2013 at 02:09:40PM -0800, Junio C Hamano wrote:

> Before parsing a suspected smart-HTTP response verify the returned
> Content-Type matches the standard. This protects a client from
> attempting to process a payload that smells like a smart-HTTP
> server response.
> 
> JGit has been doing this check on all responses since the dawn of
> time. I mistakenly failed to include it in git-core when smart HTTP
> was introduced. At the time I didn't know how to get the Content-Type
> from libcurl. I punted, meant to circle back and fix this, and just
> plain forgot about it.
> 
> Signed-off-by: Shawn Pearce <spearce@xxxxxxxxxxx>
> Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>

Should this be "From:" Shawn? The tone of the message and the S-O-B
order makes it look like it.

> @@ -133,16 +135,19 @@ static struct discovery* discover_refs(const char *service)
>  	last->buf = last->buf_alloc;
>  
>  	if (maybe_smart && 5 <= last->len && last->buf[4] == '#') {
> -		/* smart HTTP response; validate that the service
> +		/*
> +		 * smart HTTP response; validate that the service
>  		 * pkt-line matches our request.
>  		 */
> -		struct strbuf exp = STRBUF_INIT;
> -
> +		strbuf_addf(&exp, "application/x-%s-advertisement", service);
> +		if (strbuf_cmp(&exp, &type))
> +			die("invalid content-type %s", type.buf);

Hmm. I wondered if it is possible for a non-smart server to send us down
this code path, which would now complain of the bogus content-type.
Something like an info/refs file with:

  # 1
  # the comment above is meaningless, but puts a "#" at position 4.

But I note that we would already die in the next line:

>  		if (packet_get_line(&buffer, &last->buf, &last->len) <= 0)
>  			die("%s has invalid packet header", refs_url);

so I do not think the patch makes anything worse. However, should we
take this opportunity to make the "did we get a smart response" test
more robust? That is, should we actually be checking the content-type
in the outer conditional, and going down the smart code-path if it is
application/x-%s-advertisement, and otherwise treating the result as
dumb?

It's probably not a big deal, as the false positive example above is
quite specific and unlikely, but it just seems cleaner to me.

As a side note, should we (can we) care about the content-type for dumb
http? It should probably be text/plain or application/octet-stream, but
I would not be surprised if we get a variety of random junk in the real
world, though.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html