Re: [PATCH v2] revision: add `--ignore-missing-links` user option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 12, 2023 at 05:58:20PM +0200, Karthik Nayak wrote:
> The revision backend is used by multiple porcelain commands such as
> git-rev-list(1) and git-log(1). The backend currently supports ignoring
> missing links by setting the `ignore_missing_links` bit. This allows the
> revision walk to skip any objects links which are missing. Expose this
> bit via an `--ignore-missing-links` user option.
>
> A scenario where this option would be used is to find the boundary
> objects between different object directories. Consider a repository with
> a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate
> object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a
> repository, enabling this option along with the `--boundary` option for
> while disabling the alternate object directory allows us to find the
> boundary objects between the main and alternate object directory.
>
> Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx>
> ---
>
> Changes from v1:
> 1. Changes in the commit message and option description to be more specific
> and list why and what the changes are.
> 2. Ensure the new option also works with the existing `--objects` options.
> 3. More specific testing for boundary commit.
>
> Range diff against v1:
>
> 1:  c0a4dca9b0 ! 1:  e3f4d85732 revision: add `--ignore-missing-links` user option
>     @@ Commit message
>          The revision backend is used by multiple porcelain commands such as
>          git-rev-list(1) and git-log(1). The backend currently supports ignoring
>          missing links by setting the `ignore_missing_links` bit. This allows the
>     -    revision walk to skip any objects links which are missing.
>     +    revision walk to skip any objects links which are missing. Expose this
>     +    bit via an `--ignore-missing-links` user option.
>
>     -    Currently there is no way to use git-rev-list(1) to traverse the objects
>     -    of the main object directory (GIT_OBJECT_DIRECTORY) and print the
>     -    boundary objects when moving from the main object directory to the
>     -    alternate object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES).
>     -
>     -    By exposing this new flag `--ignore-missing-links`, users can set the
>     -    required env variables (GIT_OBJECT_DIRECTORY and
>     -    GIT_ALTERNATE_OBJECT_DIRECTORIES) along with the `--boundary` flag to
>     -    find the boundary objects between object directories.
>     +    A scenario where this option would be used is to find the boundary
>     +    objects between different object directories. Consider a repository with
>     +    a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate
>     +    object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a
>     +    repository, enabling this option along with the `--boundary` option for
>     +    while disabling the alternate object directory allows us to find the
>     +    boundary objects between the main and alternate object directory.
>
>          Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx>
>
>     @@ Documentation/rev-list-options.txt: explicitly.
>       	the bad input was not given.
>
>      +--ignore-missing-links::
>     -+	When an object points to another object that is missing, pretend as if the
>     -+	link did not exist. These missing links are not written to stdout unless
>     -+	the --boundary flag is passed.
>     ++	During traversal, if an object that is referenced does not
>     ++	exist, instead of dying of a repository corruption, pretend as
>     ++	if the reference itself does not exist. Running the command
>     ++	with the `--boundary` option makes these missing commits,
>     ++	together with the commits on the edge of revision ranges
>     ++	(i.e. true boundary objects), appear on the output, prefixed
>     ++	with '-'.
>      +
>       ifndef::git-rev-list[]
>       --bisect::
>       	Pretend as if the bad bisection ref `refs/bisect/bad`
>
>     + ## builtin/rev-list.c ##
>     +@@ builtin/rev-list.c: static int finish_object(struct object *obj, const char *name UNUSED,
>     + {
>     + 	struct rev_list_info *info = cb_data;
>     + 	if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
>     +-		finish_object__ma(obj);
>     ++		if (!info->revs->ignore_missing_links)
>     ++			finish_object__ma(obj);
>     + 		return 1;
>     + 	}
>     + 	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
>     +
>       ## revision.c ##
>      @@ revision.c: static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
>       		revs->limited = 1;
>     @@ t/t6022-rev-list-alternates.sh (new)
>      +test_expect_success 'create repository and alternate directory' '
>      +	git init main &&
>      +	test_commit_bulk -C main 5 &&
>     ++	BOUNDARY_COMMIT=$(git -C main rev-parse HEAD) &&
>      +	mkdir alt &&
>      +	mv main/.git/objects/* alt &&
>      +	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_commit_bulk --start=6 -C main 5
>      +'
>      +
>     -+# When the alternate odb is provided, all commits are listed.
>     ++# when the alternate odb is provided, all commits are listed along with the boundary
>     ++# commit.
>      +test_expect_success 'rev-list passes with alternate object directory' '
>     -+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_stdout_line_count = 10 git -C main rev-list HEAD
>     ++	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
>     ++	test_stdout_line_count = 10 cat actual &&
>     ++	grep $BOUNDARY_COMMIT actual
>      +'
>      +
>      +# When the alternate odb is not provided, rev-list fails since the 5th commit's
>     @@ t/t6022-rev-list-alternates.sh (new)
>      +'
>      +
>      +# With `--ignore-missing-links`, we stop the traversal when we encounter a
>     -+# missing link.
>     ++# missing link. The boundary commit is not listed as we haven't used the
>     ++# `--boundary` options.
>      +test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
>     -+	test_stdout_line_count = 5 git -C main rev-list --ignore-missing-links HEAD
>     ++	git -C main rev-list --ignore-missing-links HEAD >actual &&
>     ++	test_stdout_line_count = 5 cat actual &&
>     ++	! grep -$BOUNDARY_COMMIT actual
>      +'
>      +
>      +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
>      +# commits.
>      +test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
>     -+	git -C main rev-list --ignore-missing-links --boundary HEAD >list-output &&
>     -+	test_stdout_line_count = 6 cat list-output &&
>     -+	test_stdout_line_count = 1 cat list-output | grep "^-"
>     ++	git -C main rev-list --ignore-missing-links --boundary HEAD >actual &&
>     ++	test_stdout_line_count = 6 cat actual &&
>     ++	grep -$BOUNDARY_COMMIT actual
>     ++'
>     ++
>     ++# The `--ignore-missing-links` option should ensure that git-rev-list(1) doesn't
>     ++# fail when used alongside `--objects` when a tree is missing.
>     ++test_expect_success 'rev-list --ignore-missing-links works with missing tree' '
>     ++	echo "foo" >main/file &&
>     ++	git -C main add file &&
>     ++	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 11" &&
>     ++	TREE_OID=$(git -C main rev-parse HEAD^{tree}) &&
>     ++	mkdir alt/${TREE_OID:0:2} &&
>     ++	mv main/.git/objects/${TREE_OID:0:2}/${TREE_OID:2} alt/${TREE_OID:0:2}/ &&
>     ++	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
>     ++	! grep $TREE_OID actual
>     ++'
>     ++
>     ++# Similar to above, it should also work when a blob is missing.
>     ++test_expect_success 'rev-list --ignore-missing-links works with missing blob' '
>     ++	echo "bar" >main/file &&
>     ++	git -C main add file &&
>     ++	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 12" &&
>     ++	BLOB_OID=$(git -C main rev-parse HEAD:file) &&
>     ++	mkdir alt/${BLOB_OID:0:2} &&
>     ++	mv main/.git/objects/${BLOB_OID:0:2}/${BLOB_OID:2} alt/${BLOB_OID:0:2}/ &&
>     ++	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
>     ++	! grep $BLOB_OID actual
>      +'
>      +
>      +test_done
>
>
>  Documentation/rev-list-options.txt |  9 ++++
>  builtin/rev-list.c                 |  3 +-
>  revision.c                         |  2 +
>  t/t6022-rev-list-alternates.sh     | 75 ++++++++++++++++++++++++++++++
>  4 files changed, 88 insertions(+), 1 deletion(-)
>  create mode 100755 t/t6022-rev-list-alternates.sh
>
> diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> index a4a0cb93b2..8ee713db3d 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -227,6 +227,15 @@ explicitly.
>  	Upon seeing an invalid object name in the input, pretend as if
>  	the bad input was not given.
>
> +--ignore-missing-links::
> +	During traversal, if an object that is referenced does not
> +	exist, instead of dying of a repository corruption, pretend as
> +	if the reference itself does not exist. Running the command
> +	with the `--boundary` option makes these missing commits,
> +	together with the commits on the edge of revision ranges
> +	(i.e. true boundary objects), appear on the output, prefixed
> +	with '-'.
> +
>  ifndef::git-rev-list[]
>  --bisect::
>  	Pretend as if the bad bisection ref `refs/bisect/bad`
> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index ff715d6918..5239d83c76 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -266,7 +266,8 @@ static int finish_object(struct object *obj, const char *name UNUSED,
>  {
>  	struct rev_list_info *info = cb_data;
>  	if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
> -		finish_object__ma(obj);
> +		if (!info->revs->ignore_missing_links)
> +			finish_object__ma(obj);
>  		return 1;
>  	}
>  	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)


> diff --git a/t/t6022-rev-list-alternates.sh b/t/t6022-rev-list-alternates.sh
> new file mode 100755
> index 0000000000..08d9ffde5f
> --- /dev/null
> +++ b/t/t6022-rev-list-alternates.sh
> @@ -0,0 +1,75 @@
> +#!/bin/sh
> +
> +test_description='handling of alternates in rev-list'
> +
> +TEST_PASSES_SANITIZE_LEAK=true
> +. ./test-lib.sh
> +
> +# We create 5 commits and move them to the alt directory and
> +# create 5 more commits which will stay in the main odb.
> +test_expect_success 'create repository and alternate directory' '
> +	git init main &&

We don't necessarily have to initialize a repository, as the test suite
already does so for us. So we may want to write this instead as:

    test_commit_bulk 5 &&
    git clone --reference=. --shared . alt &&
    test_commit_bulk -C alt --start=6 5

> +# when the alternate odb is provided, all commits are listed along with the boundary
> +# commit.
> +test_expect_success 'rev-list passes with alternate object directory' '
> +	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
> +	test_stdout_line_count = 10 cat actual &&
> +	grep $BOUNDARY_COMMIT actual
> +'

Here, I think we'd want to make sure that we have not just 10 lines of
output, but that they are the 10 that we expect, like so:

    git -C alt rev-list --all --objects --no-object-names >actual.raw &&
    {
      git rev-list --all --objects --no-object-names &&
      git -C alt rev-list --all --objects --no-object-names --not \
        --alternate-refs
    } >expect.raw &&
    sort actual.raw >actual &&
    sort expect.raw >expect &&
    test_cmp expect actual

When reviewing this and tweaking some of the tests locally, I found it
useful to have some convenience functions like "hide_alternates" and
"show_alternates" to control whether or not "alt" could see its
alternate or not.

>From my review locally, the resulting changes (which can be applied
directly on top of your patch here look like):

--- 8< ---
diff --git a/t/t6022-rev-list-alternates.sh b/t/t6022-rev-list-alternates.sh
index 08d9ffde5f..ef4231b2de 100755
--- a/t/t6022-rev-list-alternates.sh
+++ b/t/t6022-rev-list-alternates.sh
@@ -8,68 +8,86 @@ TEST_PASSES_SANITIZE_LEAK=true
 # We create 5 commits and move them to the alt directory and
 # create 5 more commits which will stay in the main odb.
 test_expect_success 'create repository and alternate directory' '
-	git init main &&
-	test_commit_bulk -C main 5 &&
-	BOUNDARY_COMMIT=$(git -C main rev-parse HEAD) &&
-	mkdir alt &&
-	mv main/.git/objects/* alt &&
-	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_commit_bulk --start=6 -C main 5
+	test_commit_bulk 5 &&
+	git clone --reference=. --shared . alt &&
+	test_commit_bulk --start=6 -C alt 5
 '

 # when the alternate odb is provided, all commits are listed along with the boundary
 # commit.
 test_expect_success 'rev-list passes with alternate object directory' '
-	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
-	test_stdout_line_count = 10 cat actual &&
-	grep $BOUNDARY_COMMIT actual
+	git -C alt rev-list --all --objects --no-object-names >actual.raw &&
+	{
+		git rev-list --all --objects --no-object-names &&
+		git -C alt rev-list --all --objects --no-object-names --not \
+			--alternate-refs
+	} >expect.raw &&
+	sort actual.raw >actual &&
+	sort expect.raw >expect &&
+	test_cmp expect actual
 '

+alt=alt/.git/objects/info/alternates
+
+hide_alternates () {
+	test -f "$alt.bak" || mv "$alt" "$alt.bak"
+}
+
+show_alternates () {
+	test -f "$alt" || mv "$alt.bak" "$alt"
+}
+
 # When the alternate odb is not provided, rev-list fails since the 5th commit's
 # parent is not present in the main odb.
 test_expect_success 'rev-list fails without alternate object directory' '
-	test_must_fail git -C main rev-list HEAD
+	hide_alternates &&
+	test_must_fail git -C alt rev-list HEAD
 '

 # With `--ignore-missing-links`, we stop the traversal when we encounter a
 # missing link. The boundary commit is not listed as we haven't used the
 # `--boundary` options.
 test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
-	git -C main rev-list --ignore-missing-links HEAD >actual &&
-	test_stdout_line_count = 5 cat actual &&
-	! grep -$BOUNDARY_COMMIT actual
+	hide_alternates &&
+
+	git -C alt rev-list --objects --no-object-names \
+		--ignore-missing-links HEAD >actual.raw &&
+	git -C alt cat-file --batch-check="%(objectname)" \
+		--batch-all-objects >expect.raw &&
+
+	sort actual.raw >actual &&
+	sort expect.raw >expect &&
+	test_cmp expect actual
 '

 # With `--ignore-missing-links` and `--boundary`, we can even print those boundary
 # commits.
 test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
-	git -C main rev-list --ignore-missing-links --boundary HEAD >actual &&
-	test_stdout_line_count = 6 cat actual &&
-	grep -$BOUNDARY_COMMIT actual
+	git -C alt rev-list --ignore-missing-links --boundary HEAD >got &&
+	grep "^-$(git rev-parse HEAD)" got
 '

-# The `--ignore-missing-links` option should ensure that git-rev-list(1) doesn't
-# fail when used alongside `--objects` when a tree is missing.
-test_expect_success 'rev-list --ignore-missing-links works with missing tree' '
-	echo "foo" >main/file &&
-	git -C main add file &&
-	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 11" &&
-	TREE_OID=$(git -C main rev-parse HEAD^{tree}) &&
-	mkdir alt/${TREE_OID:0:2} &&
-	mv main/.git/objects/${TREE_OID:0:2}/${TREE_OID:2} alt/${TREE_OID:0:2}/ &&
-	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
-	! grep $TREE_OID actual
+test_expect_success "setup for rev-list --ignore-missing-links with missing objects" '
+	show_alternates &&
+	test_commit -C alt 11
 '

-# Similar to above, it should also work when a blob is missing.
-test_expect_success 'rev-list --ignore-missing-links works with missing blob' '
-	echo "bar" >main/file &&
-	git -C main add file &&
-	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 12" &&
-	BLOB_OID=$(git -C main rev-parse HEAD:file) &&
-	mkdir alt/${BLOB_OID:0:2} &&
-	mv main/.git/objects/${BLOB_OID:0:2}/${BLOB_OID:2} alt/${BLOB_OID:0:2}/ &&
-	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
-	! grep $BLOB_OID actual
-'
+for obj in "HEAD^{tree}" "HEAD:11.t"
+do
+	# The `--ignore-missing-links` option should ensure that git-rev-list(1)
+	# doesn't fail when used alongside `--objects` when a tree/blob is
+	# missing.
+	test_expect_success "rev-list --ignore-missing-links with missing $type" '
+		oid="$(git -C alt rev-parse $obj)" &&
+		path="alt/.git/objects/$(test_oid_to_path $oid)" &&
+
+		mv "$path" "$path.hidden" &&
+		test_when_finished "mv $path.hidden $path" &&
+
+		git -C alt rev-list --ignore-missing-links --objects HEAD \
+			>actual &&
+		! grep $oid actual
+	'
+done

 test_done
--- >8 ---

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux