On Mon, Oct 09, 2023 at 05:05:36PM -0400, Jeff King wrote: > We expect a commit-graph file to have a fixed-size data record for each > commit in the file (and we know the number of commits to expct from the > size of the lookup table). If we encounter a file where this is too > small, we'll look past the end of the chunk (and possibly even off the > mapped memory). > > We can fix this by checking the size up front when we record the > pointer. > > The included test doesn't segfault, since it ends up reading bytes > from another chunk. But it produces nonsense results, since the values > it reads are garbage. Our test notices this by comparing the output to a > non-corrupted run of the same command (and of course we also check that > the expected error is printed to stderr). > > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > commit-graph.c | 12 +++++++++++- > t/t5318-commit-graph.sh | 9 +++++++++ > 2 files changed, 20 insertions(+), 1 deletion(-) > > diff --git a/commit-graph.c b/commit-graph.c > index 472332f603..9b80bbd75b 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -340,6 +340,16 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start, > return 0; > } > > +static int graph_read_commit_data(const unsigned char *chunk_start, > + size_t chunk_size, void *data) > +{ > + struct commit_graph *g = data; > + if (chunk_size != g->num_commits * GRAPH_DATA_WIDTH) Should this be guarded with an st_mult? I think that GRAPH_DATA_WIDTH is defined as (the_hash_algo->rawsz + 16), so I *think* that the expression in the parenthesis would get done as a size_t, and then g->num_commits would be widened to a size_t for the purposes of evaluating this expression. So I think that this is all OK in the sense that we'd never underflow the 64-bit space, and having more than 2^64-1/36 (some eighteen quintillion) commits is... unlikely ;-). But it may be worth wrapping this computation in an st_mult() anyway to avoid future readers having to think about this. > + return error("commit-graph commit data chunk is wrong size"); > + g->chunk_commit_data = chunk_start; > + return 0; > +} > + > static int graph_read_bloom_data(const unsigned char *chunk_start, > size_t chunk_size, void *data) > { > @@ -422,7 +432,7 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s, > > read_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, graph_read_oid_fanout, graph); > read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph); > - pair_chunk_unsafe(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data); > + read_chunk(cf, GRAPH_CHUNKID_DATA, graph_read_commit_data, graph); Here again would be a good use-case for a `pair_chunk_expect()` function, but I don't want to beat a dead horse ;-). Thanks, Taylor