On Fri, Nov 10, 2023 at 01:55:48PM +0900, Junio C Hamano wrote: > Taylor Blau <me@xxxxxxxxxxxx> writes: > > > +static int pair_chunk_expect_fn(const unsigned char *chunk_start, > > + size_t chunk_size, > > + void *data) > > +{ > > + struct pair_chunk_data *pcd = data; > > + if (chunk_size / pcd->record_size != pcd->record_nr) > > + return -1; > > + *pcd->p = chunk_start; > > + return 0; > > +} > > I know one of the original places did the "divide the whole by > per-record size and see if it matches the number of records", the > same as we see above, but the check in the above could also be > > if (chunk_size != st_mult(pcd->record_size, pcd->record_nr)) > return -1; > > which would also catch the case where chunk_size is not a multiple > of the record size. Your conversion of OOFF in midx.c loses this > protection as the original uses the multiplication-and-compare, but > the rewrite to call pair_chunk_expect would call the above and > checks with the truncating-divide-and-compare. Hmm. I was thinking of Peff's "commit-graph: handle overflow in chunk_size checks", but I think that I was overly eager in applying the same reasoning to the MIDX code. The important piece of the rationale in that patch is as follows: In the current code this is only possible for the CDAT chunk, but the reasons are quite subtle. We compute g->num_commits by dividing the size of the OIDL chunk by the hash length (since it consists of a bunch of hashes). So we know that any size_t multiplication that uses a value smaller than the hash length cannot overflow. And the CDAT records are the only ones that are larger (the others are just 4-byte records). So it's worth fixing all of these, to make it clear that they're not subject to overflow (without having to reason about seemingly unrelated code). In particular, that g->num_commits is computed by dividing the length of the OIDL chunk by the hash length, thus any size_t multiplication of g->num_commits with a value smaller than that hash length cannot overflow. But I don't think we enjoy the same benefits in the MIDX scenario. In this case, the num_objects field is just: m->num_objects = ntohl(m->chunk_oid_fanout[255]) so I don't think we can make the same guarantees about what is and isn't save to compute under size_t arithmetic. I'd be curious what Peff has to say, but if he agrees with me, I'd recommend taking the first five patches, and dropping the two MIDX-related ones. Thanks, Taylor