On 8/25/2020 10:41 AM, Taylor Blau wrote: > On Tue, Aug 25, 2020 at 09:14:19AM -0400, Derrick Stolee wrote: >> The code in builtin/repack.c looks good for sure. I have a quick question >> about this new test: >> >> +test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' ' >> + git multi-pack-index write && >> + cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak && >> + test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" && >> + >> + # Write a new pack that is unknown to the multi-pack-index. >> + git hash-object -w </dev/null >blob && >> + git pack-objects $objdir/pack/pack <blob && >> + >> + GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d && >> + test_cmp_bin $objdir/pack/multi-pack-index \ >> + $objdir/pack/multi-pack-index.bak >> +' >> + >> >> You create an arbitrary blob, and then add it to a pack-file. Do we >> know that 'git repack' is definitely creating a new pack-file that makes >> our manually-created pack-file redundant? >> >> My suggestion is to have the test check itself: >> >> +test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' ' >> + git multi-pack-index write && >> + cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak && >> + test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" && >> + >> + # Write a new pack that is unknown to the multi-pack-index. >> + git hash-object -w </dev/null >blob && >> + HASH=$(git pack-objects $objdir/pack/pack <blob) && >> + >> + GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d && >> + test_cmp_bin $objdir/pack/multi-pack-index \ >> + $objdir/pack/multi-pack-index.bak && >> + test_path_is_missing $objdir/pack/pack-$HASH.pack >> +' >> + >> >> This test fails for me, on the 'test_path_is_missing'. Likely, the >> blob is seen as already in a pack-file so is just pruned by 'git repack' >> instead. I thought that perhaps we need to add a new pack ourselves that >> overrides the small pack. Here is my attempt: >> >> test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' ' >> git multi-pack-index write && >> cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak && >> test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" && >> >> # Write a new pack that is unknown to the multi-pack-index. >> BLOB1=$(echo blob1 | git hash-object -w --stdin) && >> BLOB2=$(echo blob2 | git hash-object -w --stdin) && >> cat >blobs <<-EOF && >> $BLOB1 >> $BLOB2 >> EOF >> HASH1=$(echo $BLOB1 | git pack-objects $objdir/pack/pack) && >> HASH2=$(git pack-objects $objdir/pack/pack <blobs) && >> GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d && >> test_cmp_bin $objdir/pack/multi-pack-index \ >> $objdir/pack/multi-pack-index.bak && >> test_path_is_file $objdir/pack/pack-$HASH2.pack && >> test_path_is_missing $objdir/pack/pack-$HASH1.pack >> ' >> >> However, this _still_ fails on the "test_path_is_missing" line, so I'm not sure >> how to make sure your logic is tested. I saw that 'git repack' was writing >> "nothing new to pack" in the output, so I also tested adding a few commits and >> trying to force it to repack reachable data, but I cannot seem to trigger it >> to create a new pack that overrides only one pack that is not in the MIDX. >> >> Likely, I just don't know how 'git rebase' works well enough to trigger this >> behavior. But the test as-is is not testing what you want it to test. > > I think this case might actually be impossible to tickle in a test. I > thought that 'git repack -d' looked for existing packs whose objects are > a subset of some new pack generated. But, it's much simpler than that: > '-d' by itself just looks for packs that were already on disk with the > same SHA-1 as a new pack, and it removes the old one. If 'git repack' never calls remove_redundant_pack() unless we are doing a "full repack", then we could simplify this logic: static void remove_redundant_pack(const char *dir_name, const char *base_name) { struct strbuf buf = STRBUF_INIT; - strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name); + struct multi_pack_index *m = get_multi_pack_index(the_repository); + strbuf_addf(&buf, "%s.pack", base_name); + if (m && midx_contains_pack(m, buf.buf)) + clear_midx_file(the_repository); + strbuf_insertf(&buf, 0, "%s/", dir_name); unlink_pack_path(buf.buf, 1); strbuf_release(&buf); } to static void remove_redundant_pack(const char *dir_name, const char *base_name) { struct strbuf buf = STRBUF_INIT; strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name); + clear_midx_file(the_repository); unlink_pack_path(buf.buf, 1); strbuf_release(&buf); } and get the same results as we are showing in these tests. This does move us incrementally to a better situation: don't delete the MIDX if we aren't deleting pack files. But, I think we can get around it. > Note that 'git repack' uses 'git pack-objects' internally to find > objects and generate a packfile. When calling 'git pack-objects', 'git > repack -d' passes '--all' and '--unpacked', which means that there is no > way we'd generate a new pack with the same SHA-1 as an existing pack > ordinarily. > > So, I think this case is impossible, or at least astronomically > unlikely. What is more interesting (and untested) is that adding a _new_ > pack doesn't cause us to invalidate the MIDX. Here's a patch that does > that: > > diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh > index 16a1ad040e..620f2058d6 100755 > --- a/t/t5319-multi-pack-index.sh > +++ b/t/t5319-multi-pack-index.sh > @@ -391,18 +391,27 @@ test_expect_success 'repack removes multi-pack-index when deleting packs' ' > test_path_is_missing $objdir/pack/multi-pack-index > ' > > -test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' ' > - git multi-pack-index write && > - cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak && > - test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" && > - > - # Write a new pack that is unknown to the multi-pack-index. > - git hash-object -w </dev/null >blob && > - git pack-objects $objdir/pack/pack <blob && > - > - GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d && > - test_cmp_bin $objdir/pack/multi-pack-index \ > - $objdir/pack/multi-pack-index.bak > +test_expect_success 'repack preserves multi-pack-index when creating packs' ' > + git init preserve && > + test_when_finished "rm -fr preserve" && > + ( > + cd preserve && > + midx=.git/objects/pack/multi-pack-index && > + > + test_commit "initial" && > + git repack -ad && > + git multi-pack-index write && > + ls .git/objects/pack | grep "\.pack$" >before && > + > + cp $midx $midx.bak && > + > + test_commit "another" && > + GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d && > + ls .git/objects/pack | grep "\.pack$" >after && > + > + test_cmp_bin $midx.bak $midx && > + ! test_cmp before after > + ) > ' After looking at the callers to remote_redundant_pack() I noticed that it is only called after inspecting the "names" struct, which contains the names of the packs to group into a new pack-file. We can use a .keep file to preserve the pack-file(s) in the MIDX but also to ensure multiple pack-files outside of the MIDX are repacked and deleted. While this is very unlikely in the wild, it is definitely possible. test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' ' git init preserve && test_when_finished "rm -fr preserve" && ( cd preserve && midx=.git/objects/pack/multi-pack-index && test_commit 1 && HASH1=$(git pack-objects --all .git/objects/pack/pack) && touch .git/objects/pack/pack-$HASH1.keep && cat >pack-input <<-\EOF && HEAD ^HEAD~1 EOF test_commit 2 && HASH2=$(git pack-objects --revs .git/objects/pack/pack <pack-input) && touch .git/objects/pack/pack-$HASH2.keep && git multi-pack-index write && cp $midx $midx.bak && test_commit 3 && HASH3=$(git pack-objects --revs .git/objects/pack/pack <pack-input) && test_commit 4 && HASH4=$(git pack-objects --revs .git/objects/pack/pack <pack-input) && GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -ad && test_path_is_file .git/objects/pack/pack-$HASH1.pack && test_path_is_file .git/objects/pack/pack-$HASH2.pack && test_path_is_missing .git/objects/pack/pack-$HASH3.pack && test_path_is_missing .git/objects/pack/pack-$HASH4.pack ) ' I believe this checks your condition properly enough. Thanks, -Stolee