On Mon, Oct 09, 2023 at 04:59:51PM -0400, Jeff King wrote: > We load the oid fanout chunk with pair_chunk(), which means we never see > the size of the chunk. We just assume the on-disk file uses the > appropriate size, and if it's too small we'll access random memory. > > It's easy to check this up-front; the fanout always consists of 256 > uint32's, since it is a fanout of the first byte of the hash pointing > into the oid index. These parameters can't be changed without > introducing a new chunk type. Cool, this is the first patch that should start reducing our usage of the new pair_chunk_unsafe() and hardening these reads. Let's take a look... > This matches the similar check in the midx OIDF chunk (but note that > rather than checking for the error immediately, the graph code just > leaves parts of the struct NULL and checks for required fields later). > > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > commit-graph.c | 13 +++++++++++-- > t/t5318-commit-graph.sh | 26 ++++++++++++++++++++++++++ > 2 files changed, 37 insertions(+), 2 deletions(-) > > diff --git a/commit-graph.c b/commit-graph.c > index a689a55b79..9b3b01da61 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -305,6 +305,16 @@ static int verify_commit_graph_lite(struct commit_graph *g) > return 0; > } > > +static int graph_read_oid_fanout(const unsigned char *chunk_start, > + size_t chunk_size, void *data) > +{ > + struct commit_graph *g = data; > + if (chunk_size != 256 * sizeof(uint32_t)) > + return error("commit-graph oid fanout chunk is wrong size"); Should we mark this string for translation? > + g->chunk_oid_fanout = (const uint32_t *)chunk_start; > + return 0; > +} > + Nice. This makes sense and seems like an obvious improvement over the existing code. I wonder how common this pattern is. We have read_chunk() which is for handling more complex scenarios than this. But the safe version of pair_chunk() really just wants to check that the size of the chunk is as expected and assign the location in the mmap to some pointer. Do you think it would be worth changing pair_chunk() to take an expected size_t and handle this generically? I.e. have a version of chunk-format::pair_chunk_fn() that looks something like: static int pair_chunk_fn(const unsigned char *chunk_start, size_t chunk_size, void *data) { const unsigned char **p = data; if (chunk_size != data->size) return -1; *p = chunk_start; return 0; } and then our call here would be: if (pair_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, (const unsigned char **)&graph->chunk_oid_fanout, 256 * sizeof(uint32_t)) < 0) return error("commit-graph oid fanout chunk is wrong size"); I dunno. It's hard to have a more concrete recomendation without having read the rest of the series. So it's possible that this is just complete nonsense ;-). But my hunch is that there are a number of callers that would benefit from having this built in. Thanks, Taylor