On 2/4/2021 7:00 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > >> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> >> >> When calculating the sizes of certain chunks, we should use 64-bit >> multiplication always. This allows us to properly predict the chunk >> sizes without risk of overflow. >> >> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> >> --- >> midx.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) > > This one I find somewhat questionable for multiple reasons. > > * the fourth parameter of add_chunk() is of size_t, not uint64_t; > shouldn't the multiplication be done in type size_t instead? This is probably appropriate because we will truncate to size_t if it is smaller than uint64_t. > * these mutiplications were introduced in "midx: use chunk-format > API in write_midx_internal()"; that step should use the > arithmetic with cast (if necessary) from the start, no? I wanted to isolate these changes specifically so we could be careful about the multiplications and not be distracted by them when converting to the chunk-format API. The multiplications were "moved" by that patch, not "introduced". > * There is "ctx.entries_nr * MIDX_CHUNKID_OFFSET_WIDTH" passed to > add_chunk(), in the post-context of the first hunk. Shouldn't > that be covered as well? I didn't grep for all uses of > add_chunk(), but I wouldn't be surprised if this patch missed > some of the calls that need the same treatment. And here is a great example of why it was good to call out these multiplications in their own patch. I did a full inspection of all multiplications in midx.c and found a few more instances of possible overflow. Two are on the read side, but they require the object lookup chunk to have size 4gb or larger. This is not _that_ far off from possibility! My multi-pack-index for the Windows repository is currently ~1.6 GB (in total, including the other chunks). Thanks, -Stolee