Re: [PATCH v4 3/6] pack-objects: add --sparse option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
>
> Add a '--sparse' option flag to the pack-objects builtin. This
> allows the user to specify that they want to use the new logic
> for walking trees. This logic currently does not differ from the
> existing output, but will in a later change.
>
> Create a new test script, t5322-pack-objects-sparse.sh, to ensure
> the object list that is selected matches what we expect. When we
> update the logic to walk in a sparse fashion, the final test will
> be updated to show the extra objects that are added.

Somehow checking the "these are exactly what we expect" feels
brittle.  In a history with three relevant commits A---B---C,
packing B..C could omit trees and blobs in C that appear in A but
not in B, but traditionally, because we stop traversal at B and do
not even look at A, we do not notice that such objects that need to
complete C's tree are already available in a repository that has B.
The approach of test in this patch feels similar to saying "we must
send these duplicates because we must stay dumb", even though with a
perfect knowledge about the history, e.g. bitmap, we would be able
to omit these objects in A that appear in C but not in B.

I think we want to test test both "are we sending enough, even
though we might be wasting some bandwidth by not noticing the other
side already have some?" and "are we still avoiding from becoming
overly stupid, even though we may be cheating to save traversal cost
and risking to redundantly send some objects?"

IOW, a set of tests to make sure that truly new objects are all sent
by seeing what is sent is a strict superset of what we expect.  And
another set of tests to make sure that objects that are so obviously
(this criterion may be highly subjective) be present in the
receiving repository (e.g. the tree object of commit B and what it
contains when seinding B..C) are never sent, even when using an
algorithm that are tuned for traversal cost over bandwidth
consumption.

The code change in this step looks all trivially good, and it may
want to be squashed into the previous step to become a single patch.
Otherwise, [2/6] would be adding a dead code that nobody exercises
until this step.

Thanks.


> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> ---
>  Documentation/git-pack-objects.txt |  11 ++-
>  builtin/pack-objects.c             |   5 +-
>  t/t5322-pack-objects-sparse.sh     | 115 +++++++++++++++++++++++++++++
>  3 files changed, 129 insertions(+), 2 deletions(-)
>  create mode 100755 t/t5322-pack-objects-sparse.sh
>
> diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
> index 40c825c381..e45f3e680d 100644
> --- a/Documentation/git-pack-objects.txt
> +++ b/Documentation/git-pack-objects.txt
> @@ -14,7 +14,7 @@ SYNOPSIS
>  	[--local] [--incremental] [--window=<n>] [--depth=<n>]
>  	[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
>  	[--stdout [--filter=<filter-spec>] | base-name]
> -	[--shallow] [--keep-true-parents] < object-list
> +	[--shallow] [--keep-true-parents] [--sparse] < object-list
>  
>  
>  DESCRIPTION
> @@ -196,6 +196,15 @@ depth is 4095.
>  	Add --no-reuse-object if you want to force a uniform compression
>  	level on all data no matter the source.
>  
> +--sparse::
> +	Use the "sparse" algorithm to determine which objects to include in
> +	the pack, when combined with the "--revs" option. This algorithm
> +	only walks trees that appear in paths that introduce new objects.
> +	This can have significant performance benefits when computing
> +	a pack to send a small change. However, it is possible that extra
> +	objects are added to the pack-file if the included commits contain
> +	certain types of direct renames.
> +
>  --thin::
>  	Create a "thin" pack by omitting the common objects between a
>  	sender and a receiver in order to reduce network transfer. This
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 5f70d840a7..7d5b0735e3 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -84,6 +84,7 @@ static unsigned long pack_size_limit;
>  static int depth = 50;
>  static int delta_search_threads;
>  static int pack_to_stdout;
> +static int sparse;
>  static int thin;
>  static int num_preferred_base;
>  static struct progress *progress_state;
> @@ -3135,7 +3136,7 @@ static void get_object_list(int ac, const char **av)
>  
>  	if (prepare_revision_walk(&revs))
>  		die(_("revision walk setup failed"));
> -	mark_edges_uninteresting(&revs, show_edge, 0);
> +	mark_edges_uninteresting(&revs, show_edge, sparse);
>  
>  	if (!fn_show_object)
>  		fn_show_object = show_object;
> @@ -3292,6 +3293,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  		{ OPTION_CALLBACK, 0, "unpack-unreachable", NULL, N_("time"),
>  		  N_("unpack unreachable objects newer than <time>"),
>  		  PARSE_OPT_OPTARG, option_parse_unpack_unreachable },
> +		OPT_BOOL(0, "sparse", &sparse,
> +			 N_("use the sparse reachability algorithm")),
>  		OPT_BOOL(0, "thin", &thin,
>  			 N_("create thin packs")),
>  		OPT_BOOL(0, "shallow", &shallow,
> diff --git a/t/t5322-pack-objects-sparse.sh b/t/t5322-pack-objects-sparse.sh
> new file mode 100755
> index 0000000000..81f6805bc3
> --- /dev/null
> +++ b/t/t5322-pack-objects-sparse.sh
> @@ -0,0 +1,115 @@
> +#!/bin/sh
> +
> +test_description='pack-objects object selection using sparse algorithm'
> +. ./test-lib.sh
> +
> +test_expect_success 'setup repo' '
> +	test_commit initial &&
> +	for i in $(test_seq 1 3)
> +	do
> +		mkdir f$i &&
> +		for j in $(test_seq 1 3)
> +		do
> +			mkdir f$i/f$j &&
> +			echo $j >f$i/f$j/data.txt
> +		done
> +	done &&
> +	git add . &&
> +	git commit -m "Initialized trees" &&
> +	for i in $(test_seq 1 3)
> +	do
> +		git checkout -b topic$i master &&
> +		echo change-$i >f$i/f$i/data.txt &&
> +		git commit -a -m "Changed f$i/f$i/data.txt"
> +	done &&
> +	cat >packinput.txt <<-EOF &&
> +	topic1
> +	^topic2
> +	^topic3
> +	EOF
> +	git rev-parse			\
> +		topic1			\
> +		topic1^{tree}		\
> +		topic1:f1		\
> +		topic1:f1/f1		\
> +		topic1:f1/f1/data.txt | sort >expect_objects.txt
> +'
> +
> +test_expect_success 'non-sparse pack-objects' '
> +	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
> +	git index-pack -o nonsparse.idx nonsparse.pack &&
> +	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
> +	test_cmp expect_objects.txt nonsparse_objects.txt
> +'
> +
> +test_expect_success 'sparse pack-objects' '
> +	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
> +	git index-pack -o sparse.idx sparse.pack &&
> +	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
> +	test_cmp expect_objects.txt sparse_objects.txt
> +'
> +
> +# Demonstrate that both algorithms send "extra" objects because
> +# they are not in the frontier.
> +
> +test_expect_success 'duplicate a folder from f3 and commit to topic1' '
> +	git checkout topic1 &&
> +	echo change-3 >f3/f3/data.txt &&
> +	git commit -a -m "Changed f3/f3/data.txt" &&
> +	git rev-parse			\
> +		topic1~1		\
> +		topic1~1^{tree}		\
> +		topic1^{tree}		\
> +		topic1			\
> +		topic1:f1		\
> +		topic1:f1/f1		\
> +		topic1:f1/f1/data.txt	\
> +		topic1:f3		\
> +		topic1:f3/f3		\
> +		topic1:f3/f3/data.txt | sort >expect_objects.txt
> +'
> +
> +test_expect_success 'non-sparse pack-objects' '
> +	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
> +	git index-pack -o nonsparse.idx nonsparse.pack &&
> +	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
> +	test_cmp expect_objects.txt nonsparse_objects.txt
> +'
> +
> +test_expect_success 'sparse pack-objects' '
> +	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
> +	git index-pack -o sparse.idx sparse.pack &&
> +	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
> +	test_cmp expect_objects.txt sparse_objects.txt
> +'
> +
> +test_expect_success 'duplicate a folder from f1 into f3' '
> +	mkdir f3/f4 &&
> +	cp -r f1/f1/* f3/f4 &&
> +	git add f3/f4 &&
> +	git commit -m "Copied f1/f1 to f3/f4" &&
> +	cat >packinput.txt <<-EOF &&
> +	topic1
> +	^topic1~1
> +	EOF
> +	git rev-parse		\
> +		topic1		\
> +		topic1^{tree}	\
> +		topic1:f3 | sort >expect_objects.txt
> +'
> +
> +test_expect_success 'non-sparse pack-objects' '
> +	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
> +	git index-pack -o nonsparse.idx nonsparse.pack &&
> +	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
> +	test_cmp expect_objects.txt nonsparse_objects.txt
> +'
> +
> +test_expect_success 'sparse pack-objects' '
> +	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
> +	git index-pack -o sparse.idx sparse.pack &&
> +	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
> +	test_cmp expect_objects.txt sparse_objects.txt
> +'
> +
> +test_done



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux