On Mon, Oct 09, 2023 at 05:05:53PM -0400, Jeff King wrote: > We load the bloom_filter_indexes chunk using pair_chunk(), so we have no > idea how big it is. This can lead to out-of-bounds reads if it is > smaller than expected, since we index it based on the number of commits > found elsewhere in the graph file. > > We can check the chunk size up front, like we do for CDAT and other > chunks with one fixed-size record per commit. > > The test case demonstrates the problem. It actually won't segfault, > because we end up reading random data from the follow-on chunk (BDAT in > this case), and the bounds checks added in the previous patch complain. > But this is by no means assured, and you can craft a commit-graph file > with BIDX at the end (or a smaller BDAT) that does segfault. > > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > commit-graph.c | 16 ++++++++++++++-- > t/t4216-log-bloom.sh | 9 +++++++++ > 2 files changed, 23 insertions(+), 2 deletions(-) > > diff --git a/commit-graph.c b/commit-graph.c > index f7a42be6d0..1f334987b5 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -360,6 +360,18 @@ static int graph_read_generation_data(const unsigned char *chunk_start, > return 0; > } > > +static int graph_read_bloom_index(const unsigned char *chunk_start, > + size_t chunk_size, void *data) > +{ > + struct commit_graph *g = data; > + if (chunk_size != g->num_commits * 4) { Here we probably should wrap `g->num_commits * 4` in an st_mult() call. Having 4B commits is probably pretty unlikely in practice, but we have definitely seen issues in the wild at GitHub where we have more than 2^32-1/20 commits in a single network.git. > + warning("commit-graph changed-path index chunk is too small"); Should this be marked for translation? > + return -1; > + } > + g->chunk_bloom_indexes = chunk_start; > + return 0; > +} > + > static int graph_read_bloom_data(const unsigned char *chunk_start, > size_t chunk_size, void *data) > { > @@ -470,8 +482,8 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s, > } > > if (s->commit_graph_read_changed_paths) { > - pair_chunk_unsafe(cf, GRAPH_CHUNKID_BLOOMINDEXES, > - &graph->chunk_bloom_indexes); > + read_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES, > + graph_read_bloom_index, graph); Since we know g->num_commits by this point, this would be another spot that would benefit from a `pair_chunk_expect()` function. But, again, not trying to beat a dead horse here or anything ;-). Thanks, Taylor