Re: [PATCH 05/12] http: simplify parsing of remote objects/info/packs

René Scharfe <l.s.r@xxxxxx> · Fri, 5 Apr 2019 12:41:27 +0200

Am 05.04.2019 um 01:27 schrieb Jeff King:
> We can use skip_prefix() and parse_oid_hex() to continuously increment
> our pointer, rather than dealing with magic numbers. This also fixes a
> few small shortcomings:
>
>   - if we see a 'P' line that does not match our expectations, we'll
>     leave our "i" counter in the middle of the line. So we'll parse:
>     "P P P pack-1234.pack" as if there was just one "P" which was not
>     intentional (though probably not that harmful).

How so?  The default case, which we'd fall through to, skips the rest
of such a line, doesn't it?

>
>   - if we see a line with the right prefix, suffix, and length, i.e.
>     matching /P pack-.{40}.pack\n/, we'll interpret the middle part as
>     hex without checking if it could be parsed. This could lead to us
>     looking at uninitialized garbage in the hash array. In practice this
>     means we'll just make a garbage request to the server which will
>     fail, though it's interesting that a malicious server could convince
>     us to leak 40 bytes of uninitialized stack to them.
>
>   - the current code is picky about seeing a newline at the end of file,
>     but we can easily be more liberal
>
> Signed-off-by: Jeff King <peff@xxxxxxxx>
> ---
>  http.c | 35 ++++++++++++++---------------------
>  1 file changed, 14 insertions(+), 21 deletions(-)
>
> diff --git a/http.c b/http.c
> index a32ad36ddf..2ef47bc779 100644
> --- a/http.c
> +++ b/http.c
> @@ -2147,11 +2147,11 @@ static int fetch_and_setup_pack_index(struct packed_git **packs_head,
>  int http_get_info_packs(const char *base_url, struct packed_git **packs_head)
>  {
>  	struct http_get_options options = {0};
> -	int ret = 0, i = 0;
> -	char *url, *data;
> +	int ret = 0;
> +	char *url;
> +	const char *data;
>  	struct strbuf buf = STRBUF_INIT;
> -	unsigned char hash[GIT_MAX_RAWSZ];
> -	const unsigned hexsz = the_hash_algo->hexsz;
> +	struct object_id oid;
>
>  	end_url_with_slash(&buf, base_url);
>  	strbuf_addstr(&buf, "objects/info/packs");
> @@ -2163,24 +2163,17 @@ int http_get_info_packs(const char *base_url, struct packed_git **packs_head)
>  		goto cleanup;
>
>  	data = buf.buf;
> -	while (i < buf.len) {
> -		switch (data[i]) {
> -		case 'P':
> -			i++;
> -			if (i + hexsz + 12 <= buf.len &&
> -			    starts_with(data + i, " pack-") &&
> -			    starts_with(data + i + hexsz + 6, ".pack\n")) {
> -				get_sha1_hex(data + i + 6, hash);
> -				fetch_and_setup_pack_index(packs_head, hash,
> -						      base_url);
> -				i += hexsz + 11;
> -				break;
> -			}
> -		default:
> -			while (i < buf.len && data[i] != '\n')
> -				i++;
> +	while (*data) {
> +		if (skip_prefix(data, "P pack-", &data) &&
> +		    !parse_oid_hex(data, &oid, &data) &&
> +		    skip_prefix(data, ".pack", &data) &&
> +		    (*data == '\n' || *data == '\0')) {
> +			fetch_and_setup_pack_index(packs_head, oid.hash, base_url);
> +		} else {
> +			data = strchrnul(data, '\n');
>  		}
> -		i++;
> +		if (*data)
> +			data++; /* skip past newline */

So much simpler, *and* converted to object_id -- I like it!

Parsing "P" and "pack-" together crosses logical token boundaries,
but that I don't mind it here.

René