Re: [PATCH 12/20] commit-graph: check size of commit data chunk

Taylor Blau <me@xxxxxxxxxxxx> · Wed, 11 Oct 2023 14:46:28 -0400

On Mon, Oct 09, 2023 at 05:05:36PM -0400, Jeff King wrote:
> We expect a commit-graph file to have a fixed-size data record for each
> commit in the file (and we know the number of commits to expct from the
> size of the lookup table). If we encounter a file where this is too
> small, we'll look past the end of the chunk (and possibly even off the
> mapped memory).
>
> We can fix this by checking the size up front when we record the
> pointer.
>
> The included test doesn't segfault, since it ends up reading bytes
> from another chunk. But it produces nonsense results, since the values
> it reads are garbage. Our test notices this by comparing the output to a
> non-corrupted run of the same command (and of course we also check that
> the expected error is printed to stderr).
>
> Signed-off-by: Jeff King <peff@xxxxxxxx>
> ---
>  commit-graph.c          | 12 +++++++++++-
>  t/t5318-commit-graph.sh |  9 +++++++++
>  2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 472332f603..9b80bbd75b 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -340,6 +340,16 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start,
>  	return 0;
>  }
>
> +static int graph_read_commit_data(const unsigned char *chunk_start,
> +				  size_t chunk_size, void *data)
> +{
> +	struct commit_graph *g = data;
> +	if (chunk_size != g->num_commits * GRAPH_DATA_WIDTH)

Should this be guarded with an st_mult? I think that GRAPH_DATA_WIDTH is
defined as (the_hash_algo->rawsz + 16), so I *think* that the expression
in the parenthesis would get done as a size_t, and then g->num_commits
would be widened to a size_t for the purposes of evaluating this
expression.

So I think that this is all OK in the sense that we'd never underflow
the 64-bit space, and having more than 2^64-1/36 (some eighteen
quintillion) commits is... unlikely ;-).

But it may be worth wrapping this computation in an st_mult() anyway to
avoid future readers having to think about this.

> +		return error("commit-graph commit data chunk is wrong size");
> +	g->chunk_commit_data = chunk_start;
> +	return 0;
> +}
> +
>  static int graph_read_bloom_data(const unsigned char *chunk_start,
>  				  size_t chunk_size, void *data)
>  {
> @@ -422,7 +432,7 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
>
>  	read_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, graph_read_oid_fanout, graph);
>  	read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph);
> -	pair_chunk_unsafe(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data);
> +	read_chunk(cf, GRAPH_CHUNKID_DATA, graph_read_commit_data, graph);

Here again would be a good use-case for a `pair_chunk_expect()`
function, but I don't want to beat a dead horse ;-).

Thanks,
Taylor