On Wed, Oct 11, 2023 at 02:38:07PM -0400, Taylor Blau wrote: > > offset32 ^= MIDX_LARGE_OFFSET_NEEDED; > > - return get_be64(m->chunk_large_offsets + > > - st_mult(sizeof(uint64_t), offset32)); > > + if (offset32 >= m->chunk_large_offsets_len / sizeof(uint64_t)) > > + die(_("multi-pack-index large offset out of bounds")); > > + return get_be64(m->chunk_large_offsets + sizeof(uint64_t) * offset32); > > Makes sense, and this seems very reasonable. I had a couple of thoughts > on this hunk, one nitpick, and one non-nitpick ;-). > > The nitpick is the easy one, which is that I typically think of scaling > some index to produce an offset into some chunk, instead of dividing and > going the other way. > > So I probably would have written something like: > > if (st_mult(offset32, sizeof(uint64_t)) >= m->chunk_large_offsets_len) > die(_("multi-pack-index large offset out of bounds")); Yeah, I admit that my inclination is to think of it that way, too, and the division is a bit of an inversion. But I guess I am hard-wired to do bounds checks with division, because I know it avoids overflow issues. And the behavior is actually different, since st_mult() dies (with a somewhat vague message) rather than returning with an error. I was actually tempted to write a small inline helper to do bounds-checks (since this pattern appears a few times in this series). But of course it suffers from the same issues (it dies with a vague message, or it returns a result code and then is awkward to call/use because you have to stick the output in an out-parameter). > But that is definitely a subjective/minor point, and I would not at all > be sad if you felt differently about it. That said, I do wish for a > little more information in the die() message, perhaps: > > if (st_mult(offset32, sizeof(uint64_t)) >= m->chunk_large_offsets_len) > die(_("multi-pack-index large offset for %s out of bounds " > "(%"PRIuMAX" is beyond %"PRIuMAX")"), > (uintmax_t)(offset32 * sizeof(uint64_t)), /* checked earlier */ > (uintmax_t)m->chunk_large_offsets_len); > > I can imagine that for debugging corrupt MIDXs in the future, having > some extra information like the above would give us a better starting > point than popping into a debugger to get the same information. As you'll see when you get to the BDAT/BIDX messages, I put in a load of detail. That's because I did those ones first, and then after seeing the terse "eh, it's the wrong size" messages in the rest of the commit-graph and midx code, I just followed suit. We can circle back and improve the detail in the messages. One slightly annoying thing is dealing with it in the tests. We'd have to do one of: - make the tests more brittle by hard-coding positions and offsets we don't necessarily care about - loosen the tests by just matching with "grep", which can sometimes miss other unexpected output - do some post-processing on the output to massage out the details we don't care about; this in theory is the best of both worlds, but reliably post-processing is non-trivial. So of the three I'd probably just loosen to use "grep" in most spots. -Peff