Re: [PATCH 1/7] chunk-format: introduce `pair_chunk_expect()` helper

Jeff King <peff@xxxxxxxx> · Fri, 10 Nov 2023 17:01:42 -0500

On Fri, Nov 10, 2023 at 11:27:41AM -0500, Taylor Blau wrote:

> Hmm. I was thinking of Peff's "commit-graph: handle overflow in
> chunk_size checks", but I think that I was overly eager in applying the
> same reasoning to the MIDX code.
> 
> The important piece of the rationale in that patch is as follows:
> 
>     In the current code this is only possible for the CDAT chunk, but
>     the reasons are quite subtle. We compute g->num_commits by dividing
>     the size of the OIDL chunk by the hash length (since it consists of
>     a bunch of hashes). So we know that any size_t multiplication that
>     uses a value smaller than the hash length cannot overflow. And the
>     CDAT records are the only ones that are larger (the others are just
>     4-byte records). So it's worth fixing all of these, to make it clear
>     that they're not subject to overflow (without having to reason about
>     seemingly unrelated code).
> 
> In particular, that g->num_commits is computed by dividing the length of
> the OIDL chunk by the hash length, thus any size_t multiplication of
> g->num_commits with a value smaller than that hash length cannot
> overflow.
> 
> But I don't think we enjoy the same benefits in the MIDX scenario. In
> this case, the num_objects field is just:
> 
>     m->num_objects = ntohl(m->chunk_oid_fanout[255])
> 
> so I don't think we can make the same guarantees about what is and isn't
> save to compute under size_t arithmetic.

Yes, the logic does not apply to the midx code (just like the graph code
after we switch to using the fanout value later in my series). But that
paragraph was just explaining why only the CDAT chunk was _currently_
vulnerable.

The more interesting question is the paragraphs after that, about
whether it is OK to die() or not when we see overflow (and IMHO it is
not for commit-graphs).

The situation _is_ different there for midx's, because we immediately
die() if we see a bogus midx anyway. But I think that is wrong, and
something we may want to change in the long run. Both because it's the
wrong thing for lib-ification, but also because we can easily keep going
if the midx is broken; we can still use the individual pack idx files.

> I'd be curious what Peff has to say, but if he agrees with me, I'd
> recommend taking the first five patches, and dropping the two
> MIDX-related ones.

I think dropping those is a bad direction. The point of adding
pair_chunk_expect() is that we could use it consistently.

-Peff