On Wed, Jun 23, 2021 at 02:39:15PM -0400, Taylor Blau wrote: > 'git multi-pack-index verify' inspects the data in an existing MIDX for > correctness by checking that the recorded object offsets are correct, > and so on. > > But it does not check that the file's trailing checksum matches the data > that it records. So, if an on-disk corruption happened to occur in the > final few bytes (and all other data was recorded correctly), we would: > > - get a clean result from 'git multi-pack-index verify', but > - be unable to reuse the existing MIDX when writing a new one (since > we now check for checksum mismatches before reusing a MIDX) > > Teach the 'verify' sub-command to recognize corruption in the checksum > by calling midx_checksum_valid(). > > Suggested-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > --- > midx.c | 3 +++ > t/t5319-multi-pack-index.sh | 5 +++++ > 2 files changed, 8 insertions(+) > > diff --git a/midx.c b/midx.c > index a12cbbf928..9a35b0255d 100644 > --- a/midx.c > +++ b/midx.c > @@ -1228,6 +1228,9 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag > return result; > } > > + if (!midx_checksum_valid(m)) > + midx_report(_("incorrect checksum")); > + > if (flags & MIDX_PROGRESS) > progress = start_delayed_progress(_("Looking for referenced packfiles"), > m->num_packs); > diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh > index d582f370c4..7609f1ea64 100755 > --- a/t/t5319-multi-pack-index.sh > +++ b/t/t5319-multi-pack-index.sh > @@ -418,6 +418,11 @@ test_expect_success 'corrupt MIDX is not reused' ' > git multi-pack-index verify > ' > > +test_expect_success 'verify incorrect checksum' ' > + pos=$(($(wc -c <$objdir/pack/multi-pack-index) - 1)) && > + corrupt_midx_and_verify $pos "\377" $objdir "incorrect checksum" This test is flaky and can fail with: ... + printf \377 + dd of=.git/objects/pack/multi-pack-index bs=1 seek=3839 conv=notrunc 1+0 records in 1+0 records out 1 byte copied, 5.0656e-05 s, 19.7 kB/s + test_must_fail git multi-pack-index verify --object-dir=.git/objects test_must_fail: command succeeded: git multi-pack-index verify --object-dir=.git/objects error: last command exited with $?=1 + mv midx-backup .git/objects/pack/multi-pack-index not ok 44 - verify incorrect checksum So the test corrupts the checksum trailer in the 'multi-pack-index' file by overwriting its last byte with 0xff, but if that byte were already 0xff, then the file would be left as is, and 'git multi-pack-index verify' wouldn't find anything amiss. Since SHA1 is essentially random, there's a 1:256 chance of that happening, assuming that the file's content is random. That't not really the case, however, because both the test repository's objects and the resulting packfiles are deterministic, and, consequently, the content of the MIDX is somewhat deterministic. Only "somewhat", though, because several of those objects appear in multiple packfiles, and the MIDX selects a copy in the most recently modified packfile, so filesystem mtime resolution and second boundaries become significant, and cause some variance in the contents of the OOFF chunk. Recently a laptop crossed my way that is somehow exceptionally good at generating MIDX files ending with 0xff: # ad-hoc checksum statistics: $ git diff diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index f1ee2ce56d..605713b518 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -482,9 +482,13 @@ test_expect_success 'corrupt MIDX is not reused' ' ' test_expect_success 'verify incorrect checksum' ' + skip=$(($(wc -c <$objdir/pack/multi-pack-index) - 20)) && + printf "checksum: " >&5 && + od -x -w20 -j$skip --endian=big -An "$objdir/pack/multi-pack-index" >&5 && pos=$(($(wc -c <$objdir/pack/multi-pack-index) - 1)) && corrupt_midx_and_verify $pos "\377" $objdir "incorrect checksum" ' +test_done test_expect_success 'repack progress off for redirected stderr' ' GIT_PROGRESS_DELAY=0 git multi-pack-index --object-dir=$objdir repack 2>err && $ for i in {1..500} ; do ./t5319-multi-pack-index.sh |sed -n -e 's/^checksum: //p' ; done |sort |uniq -c 31 1a70 3b1c 8ed3 56a6 5101 2a38 057e 698d 6faf fbaa 340 5fc0 552f 0ac0 c876 f229 d9e3 ef13 a314 5847 89ff 129 ce7d 3710 fd21 ef7b 8316 2b99 4e6c e5d5 5e7c 7b08 That's almost 70% failure rate, though I haven't been able to trigger this failure on any of the other machines that I have access to.