Re: [PATCH 2/3] t7703: demonstrate object corruption with pack.packSizeLimit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Taylor Blau wrote:
> When doing a `--geometric=<d>` repack, `git repack` determines a
> splitting point among packs ordered by their object count such that:
> 
>   - each pack above the split has at least `<d>` times as many objects
>     as the next-largest pack by object count, and
>   - the first pack above the split has at least `<d>` times as many
>     object as the sum of all packs below the split line combined
> 
> `git repack` then creates a pack containing all of the objects contained
> in packs below the split line by running `git pack-objects
> --stdin-packs` underneath. Once packs are moved into place, then any
> packs below the split line are removed, since their objects were just
> combined into a new pack.
> 
> But `git repack` tries to be careful to avoid removing a pack that it
> just wrote, by checking:
> 
>     struct packed_git *p = geometry->pack[i];
>     if (string_list_has_string(&names, hash_to_hex(p->hash)))
>       continue;
> 
> in the `delete_redundant` and `geometric` conditional towards the end of
> `cmd_repack`.
> 
> But it's possible to trick `git repack` into not recognizing a pack that
> it just wrote when `names` is out-of-order (which violates
> `string_list_has_string()`'s assumption that the list is sorted and thus
> binary search-able).
> 
> When this happens in just the right circumstances, it is possible to
> remove a pack that we just wrote, leading to object corruption.
> 
> Luckily, this is quite difficult to provoke in practice (for a couple of
> reasons):
> 
>   - we ordinarily write just one pack, so `names` usually contains just
>     one entry, and is thus sorted
>   - when we do write more than one pack (e.g., due to `--max-pack-size`)
>     we have to: (a) write a pack identical to one that already
>     exists, (b) have that pack be below the split line, and (c) have
>     the set of packs written by `pack-objects` occur in an order which
>     tricks `string_list_has_string()`.
> 
> Demonstrate the above scenario in a failing test, which causes `git
> repack --geometric` to write a pack which occurs below the split line,
> _and_ fail to recognize that it wrote that pack.
> 
> The following patch will fix this bug.
> 
> Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
> ---
>  t/t7703-repack-geometric.sh | 47 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
> 
> diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
> index 91bb2b37a8..2cd1de7295 100755
> --- a/t/t7703-repack-geometric.sh
> +++ b/t/t7703-repack-geometric.sh
> @@ -7,6 +7,7 @@ test_description='git repack --geometric works correctly'
>  GIT_TEST_MULTI_PACK_INDEX=0
>  
>  objdir=.git/objects
> +packdir=$objdir/pack
>  midx=$objdir/pack/multi-pack-index
>  
>  test_expect_success '--geometric with no packs' '
> @@ -230,4 +231,50 @@ test_expect_success '--geometric chooses largest MIDX preferred pack' '
>  	)
>  '
>  
> +test_expect_failure '--geometric with pack.packSizeLimit' '
> +	git init pack-rewrite &&
> +	test_when_finished "rm -fr pack-rewrite" &&
> +	(
> +		cd pack-rewrite &&
> +
> +		test-tool genrandom foo 1048576 >foo &&
> +		test-tool genrandom bar 1048576 >bar &&
> +

I was a bit worried about this test being flaky in the future (relying on
particular pseudorandomly-generated file contents and the subsequent
ordering of hashes on the packs). But, since neither 'genrandom' nor the
pack hash generation seem likely to change (and I can't come up with an
alternative to this approach anyway), the test looks good as-is. 

> +		git add foo bar &&
> +		test_tick &&
> +		git commit -m base &&
> +
> +		git rev-parse HEAD:foo HEAD:bar >p1.objects &&
> +		git rev-parse HEAD HEAD^{tree} >p2.objects &&
> +
> +		# These two packs each contain two objects, so the following
> +		# `--geometric` repack will try to combine them.
> +		p1="$(git pack-objects $packdir/pack <p1.objects)" &&
> +		p2="$(git pack-objects $packdir/pack <p2.objects)" &&
> +
> +		# Remove any loose objects in packs, since we do not want extra
> +		# copies around (which would mask over potential object
> +		# corruption issues).
> +		git prune-packed &&
> +
> +		# Both p1 and p2 will be rolled up, but pack-objects will write
> +		# three packs:
> +		#
> +		#   - one containing object "foo",
> +		#   - another containing object "bar",
> +		#   - a final pack containing the commit and tree objects
> +		#     (identical to p2 above)
> +		git repack --geometric 2 -d --max-pack-size=1048576 &&
> +
> +		# Ensure `repack` can detect that the third pack it wrote
> +		# (containing just the tree and commit objects) was identical to
> +		# one that was below the geometric split, so that we can save it
> +		# from deletion.
> +		#
> +		# If `repack` fails to do that, we will incorrectly delete p2,
> +		# causing object corruption.
> +		git fsck
> +	)
> +'
> +
>  test_done




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux