Taylor Blau wrote: > When doing a `--geometric=<d>` repack, `git repack` determines a > splitting point among packs ordered by their object count such that: > > - each pack above the split has at least `<d>` times as many objects > as the next-largest pack by object count, and > - the first pack above the split has at least `<d>` times as many > object as the sum of all packs below the split line combined > > `git repack` then creates a pack containing all of the objects contained > in packs below the split line by running `git pack-objects > --stdin-packs` underneath. Once packs are moved into place, then any > packs below the split line are removed, since their objects were just > combined into a new pack. > > But `git repack` tries to be careful to avoid removing a pack that it > just wrote, by checking: > > struct packed_git *p = geometry->pack[i]; > if (string_list_has_string(&names, hash_to_hex(p->hash))) > continue; > > in the `delete_redundant` and `geometric` conditional towards the end of > `cmd_repack`. > > But it's possible to trick `git repack` into not recognizing a pack that > it just wrote when `names` is out-of-order (which violates > `string_list_has_string()`'s assumption that the list is sorted and thus > binary search-able). > > When this happens in just the right circumstances, it is possible to > remove a pack that we just wrote, leading to object corruption. > > Luckily, this is quite difficult to provoke in practice (for a couple of > reasons): > > - we ordinarily write just one pack, so `names` usually contains just > one entry, and is thus sorted > - when we do write more than one pack (e.g., due to `--max-pack-size`) > we have to: (a) write a pack identical to one that already > exists, (b) have that pack be below the split line, and (c) have > the set of packs written by `pack-objects` occur in an order which > tricks `string_list_has_string()`. > > Demonstrate the above scenario in a failing test, which causes `git > repack --geometric` to write a pack which occurs below the split line, > _and_ fail to recognize that it wrote that pack. > > The following patch will fix this bug. > > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > --- > t/t7703-repack-geometric.sh | 47 +++++++++++++++++++++++++++++++++++++ > 1 file changed, 47 insertions(+) > > diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh > index 91bb2b37a8..2cd1de7295 100755 > --- a/t/t7703-repack-geometric.sh > +++ b/t/t7703-repack-geometric.sh > @@ -7,6 +7,7 @@ test_description='git repack --geometric works correctly' > GIT_TEST_MULTI_PACK_INDEX=0 > > objdir=.git/objects > +packdir=$objdir/pack > midx=$objdir/pack/multi-pack-index > > test_expect_success '--geometric with no packs' ' > @@ -230,4 +231,50 @@ test_expect_success '--geometric chooses largest MIDX preferred pack' ' > ) > ' > > +test_expect_failure '--geometric with pack.packSizeLimit' ' > + git init pack-rewrite && > + test_when_finished "rm -fr pack-rewrite" && > + ( > + cd pack-rewrite && > + > + test-tool genrandom foo 1048576 >foo && > + test-tool genrandom bar 1048576 >bar && > + I was a bit worried about this test being flaky in the future (relying on particular pseudorandomly-generated file contents and the subsequent ordering of hashes on the packs). But, since neither 'genrandom' nor the pack hash generation seem likely to change (and I can't come up with an alternative to this approach anyway), the test looks good as-is. > + git add foo bar && > + test_tick && > + git commit -m base && > + > + git rev-parse HEAD:foo HEAD:bar >p1.objects && > + git rev-parse HEAD HEAD^{tree} >p2.objects && > + > + # These two packs each contain two objects, so the following > + # `--geometric` repack will try to combine them. > + p1="$(git pack-objects $packdir/pack <p1.objects)" && > + p2="$(git pack-objects $packdir/pack <p2.objects)" && > + > + # Remove any loose objects in packs, since we do not want extra > + # copies around (which would mask over potential object > + # corruption issues). > + git prune-packed && > + > + # Both p1 and p2 will be rolled up, but pack-objects will write > + # three packs: > + # > + # - one containing object "foo", > + # - another containing object "bar", > + # - a final pack containing the commit and tree objects > + # (identical to p2 above) > + git repack --geometric 2 -d --max-pack-size=1048576 && > + > + # Ensure `repack` can detect that the third pack it wrote > + # (containing just the tree and commit objects) was identical to > + # one that was below the geometric split, so that we can save it > + # from deletion. > + # > + # If `repack` fails to do that, we will incorrectly delete p2, > + # causing object corruption. > + git fsck > + ) > +' > + > test_done