Re: [PATCH v3 4/4] commit: rewrite read_graft_line

Jeff King <peff@xxxxxxxx> · Fri, 18 Aug 2017 02:43:36 -0400

On Fri, Aug 18, 2017 at 03:59:38AM +0200, Patryk Obara wrote:

> Determine the number of object_id's to parse in a single graft line by
> counting separators (whitespace characters) instead of dividing by
> length of hash representation.
> 
> This way graft parsing code can support different sizes of hashes
> without any further code adaptations.

Sounds like a reasonable approach, though I wonder what happens if our
counting pass differs in its behavior from the actual parse.

E.g., here:

> +	/* count number of blanks to determine size of array to allocate */
> +	for (i = 0, n = 0; i < line->len; i++)
> +		if (isspace(line->buf[i]))
> +			n++;

If we see multiple spaces like "1234abcd     5678abcd" we'll allocate a
slot for each. I think that's OK because here:

> +	for (i = 0; i < graft->nr_parent; i++)
> +		if (!isspace(*tail++) || parse_oid_hex(tail, &graft->parent[i], &tail))
>  			goto bad_graft_data;

We'd reject such an input totally (though as an interesting side effect,
you can convince the parser to allocate 20x as much RAM as you send it;
one oid for each space).

So we're probably fine. The two parsing passes are right next to each
other and are sufficiently simple and strict that we don't have to
worry about them diverging.

The single-pass alternative would probably be to read into a dynamic
structure like an oid_array, and then copy the result into the flex
structure.

Or of course to stop using a flex structure, as your original pass did.
I agree with Junio that the use of object_id's is orthogonal to using a
FLEX_ARRAY. But I could also see an argument that the complexity the
flex array adds here isn't worth the savings. The main benefits of a
flex array are:

  1. Less memory used. But we don't expect to see a large enough number
     of grafts for this to matter.

  2. A more compact memory representation, which can be faster. But
     accessing the parent list of a graft isn't going to be the hot code
     path. It probably only happens once in a program run when we
     rewrite the parents (versus checking the grafted commit, which is
     looked up once per commit we access).

  3. It's easier to free the struct and its associated resources in a
     single free(). But we never free the graft list.

Reading your original, my thought was "why _not_ keep doing it as a
FLEX_ARRAY, as it saves a little memory, which can't hurt". But seeing
this pre-counting phase, I think it does make the code a little more
complicated. But I'd be OK with doing it either way.

-Peff