Re: [PATCH v4 05/14] commit-graph: add base graphs chunk

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 07 Jun 2019 11:15:49 -0700

"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> -  1-byte (reserved for later use)
> -     Current clients should ignore this value.
> +  1-byte number (B) of base commit-graphs
> +      We infer the length (H*B) of the Base Graphs chunk
> +      from this value.
>  
>  CHUNK LOOKUP:
>  
> @@ -92,6 +93,12 @@ CHUNK DATA:
>        positions for the parents until reaching a value with the most-significant
>        bit on. The other bits correspond to the position of the last parent.
>  
> +  Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional]
> +      This list of H-byte hashes describe a set of B commit-graph files that
> +      form a commit-graph chain. The graph position for the ith commit in this
> +      file's OID Lookup chunk is equal to i plus the number of commits in all
> +      base graphs.  If B is non-zero, this chunk must exist.

Hmph, an obvious alternative design would be to make the base list
self describing without using the "reserved for future use" byte,
which would allow more than 256 bases (not that being able to use
300 bases is necessarily a useful feature) and also leave the
reserved byte unused.

It's not like being able to detect discrepancy (e.g. B!=0 but BASE
chunk is missing, and/or BASE chunk appears but B==0) adds value by
offering more protection against file corruption, so I am wondering
why it is a good idea to consume the reserved byte for this.

> +	if (n && !g->chunk_base_graphs) {
> +		warning(_("commit-graph has no base graphs chunk"));
> +		return 0;
> +	}
> +
n>  	while (n) {
>  		n--;
> +
> +		if (!oideq(&oids[n], &cur_g->oid) ||
> +		    !hasheq(oids[n].hash, g->chunk_base_graphs + g->hash_len * n)) {

Here, load_commit_graph_chain() that goes over the on-disk chain
file that lists graph files called us with 'n', which can run up to
the number of graph files listed in that file---and that number can
be more than what is recorded in the graph-list chunk, in which case
we are over-reading with this hasheq(), right?

It seems that parse_commit_graph() only cares about the beginning of
each chunk, and a crafted graph file can record two chunks with a
gap in between, or two chunks that overlap, and nobody would notice.
Is that true?

Wasted space in the file between two chunks (i.e. a gap) is not
necessarily bad and may not be a warning-worthy thing, but two
chunks that overlap is probably not a good idea and worth noticing.
The only sanity check it seems to do is to forbid chunks of the same
kind from appearing twice.