On Tue, Sep 12, 2023 at 05:58:20PM +0200, Karthik Nayak wrote: > The revision backend is used by multiple porcelain commands such as > git-rev-list(1) and git-log(1). The backend currently supports ignoring > missing links by setting the `ignore_missing_links` bit. This allows the > revision walk to skip any objects links which are missing. Expose this > bit via an `--ignore-missing-links` user option. > > A scenario where this option would be used is to find the boundary > objects between different object directories. Consider a repository with > a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate > object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a > repository, enabling this option along with the `--boundary` option for > while disabling the alternate object directory allows us to find the > boundary objects between the main and alternate object directory. > > Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx> > --- > > Changes from v1: > 1. Changes in the commit message and option description to be more specific > and list why and what the changes are. > 2. Ensure the new option also works with the existing `--objects` options. > 3. More specific testing for boundary commit. > > Range diff against v1: > > 1: c0a4dca9b0 ! 1: e3f4d85732 revision: add `--ignore-missing-links` user option > @@ Commit message > The revision backend is used by multiple porcelain commands such as > git-rev-list(1) and git-log(1). The backend currently supports ignoring > missing links by setting the `ignore_missing_links` bit. This allows the > - revision walk to skip any objects links which are missing. > + revision walk to skip any objects links which are missing. Expose this > + bit via an `--ignore-missing-links` user option. > > - Currently there is no way to use git-rev-list(1) to traverse the objects > - of the main object directory (GIT_OBJECT_DIRECTORY) and print the > - boundary objects when moving from the main object directory to the > - alternate object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). > - > - By exposing this new flag `--ignore-missing-links`, users can set the > - required env variables (GIT_OBJECT_DIRECTORY and > - GIT_ALTERNATE_OBJECT_DIRECTORIES) along with the `--boundary` flag to > - find the boundary objects between object directories. > + A scenario where this option would be used is to find the boundary > + objects between different object directories. Consider a repository with > + a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate > + object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a > + repository, enabling this option along with the `--boundary` option for > + while disabling the alternate object directory allows us to find the > + boundary objects between the main and alternate object directory. > > Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx> > > @@ Documentation/rev-list-options.txt: explicitly. > the bad input was not given. > > +--ignore-missing-links:: > -+ When an object points to another object that is missing, pretend as if the > -+ link did not exist. These missing links are not written to stdout unless > -+ the --boundary flag is passed. > ++ During traversal, if an object that is referenced does not > ++ exist, instead of dying of a repository corruption, pretend as > ++ if the reference itself does not exist. Running the command > ++ with the `--boundary` option makes these missing commits, > ++ together with the commits on the edge of revision ranges > ++ (i.e. true boundary objects), appear on the output, prefixed > ++ with '-'. > + > ifndef::git-rev-list[] > --bisect:: > Pretend as if the bad bisection ref `refs/bisect/bad` > > + ## builtin/rev-list.c ## > +@@ builtin/rev-list.c: static int finish_object(struct object *obj, const char *name UNUSED, > + { > + struct rev_list_info *info = cb_data; > + if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) { > +- finish_object__ma(obj); > ++ if (!info->revs->ignore_missing_links) > ++ finish_object__ma(obj); > + return 1; > + } > + if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT) > + > ## revision.c ## > @@ revision.c: static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg > revs->limited = 1; > @@ t/t6022-rev-list-alternates.sh (new) > +test_expect_success 'create repository and alternate directory' ' > + git init main && > + test_commit_bulk -C main 5 && > ++ BOUNDARY_COMMIT=$(git -C main rev-parse HEAD) && > + mkdir alt && > + mv main/.git/objects/* alt && > + GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_commit_bulk --start=6 -C main 5 > +' > + > -+# When the alternate odb is provided, all commits are listed. > ++# when the alternate odb is provided, all commits are listed along with the boundary > ++# commit. > +test_expect_success 'rev-list passes with alternate object directory' ' > -+ GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_stdout_line_count = 10 git -C main rev-list HEAD > ++ GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual && > ++ test_stdout_line_count = 10 cat actual && > ++ grep $BOUNDARY_COMMIT actual > +' > + > +# When the alternate odb is not provided, rev-list fails since the 5th commit's > @@ t/t6022-rev-list-alternates.sh (new) > +' > + > +# With `--ignore-missing-links`, we stop the traversal when we encounter a > -+# missing link. > ++# missing link. The boundary commit is not listed as we haven't used the > ++# `--boundary` options. > +test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' ' > -+ test_stdout_line_count = 5 git -C main rev-list --ignore-missing-links HEAD > ++ git -C main rev-list --ignore-missing-links HEAD >actual && > ++ test_stdout_line_count = 5 cat actual && > ++ ! grep -$BOUNDARY_COMMIT actual > +' > + > +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary > +# commits. > +test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' ' > -+ git -C main rev-list --ignore-missing-links --boundary HEAD >list-output && > -+ test_stdout_line_count = 6 cat list-output && > -+ test_stdout_line_count = 1 cat list-output | grep "^-" > ++ git -C main rev-list --ignore-missing-links --boundary HEAD >actual && > ++ test_stdout_line_count = 6 cat actual && > ++ grep -$BOUNDARY_COMMIT actual > ++' > ++ > ++# The `--ignore-missing-links` option should ensure that git-rev-list(1) doesn't > ++# fail when used alongside `--objects` when a tree is missing. > ++test_expect_success 'rev-list --ignore-missing-links works with missing tree' ' > ++ echo "foo" >main/file && > ++ git -C main add file && > ++ GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 11" && > ++ TREE_OID=$(git -C main rev-parse HEAD^{tree}) && > ++ mkdir alt/${TREE_OID:0:2} && > ++ mv main/.git/objects/${TREE_OID:0:2}/${TREE_OID:2} alt/${TREE_OID:0:2}/ && > ++ git -C main rev-list --ignore-missing-links --objects HEAD >actual && > ++ ! grep $TREE_OID actual > ++' > ++ > ++# Similar to above, it should also work when a blob is missing. > ++test_expect_success 'rev-list --ignore-missing-links works with missing blob' ' > ++ echo "bar" >main/file && > ++ git -C main add file && > ++ GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 12" && > ++ BLOB_OID=$(git -C main rev-parse HEAD:file) && > ++ mkdir alt/${BLOB_OID:0:2} && > ++ mv main/.git/objects/${BLOB_OID:0:2}/${BLOB_OID:2} alt/${BLOB_OID:0:2}/ && > ++ git -C main rev-list --ignore-missing-links --objects HEAD >actual && > ++ ! grep $BLOB_OID actual > +' > + > +test_done > > > Documentation/rev-list-options.txt | 9 ++++ > builtin/rev-list.c | 3 +- > revision.c | 2 + > t/t6022-rev-list-alternates.sh | 75 ++++++++++++++++++++++++++++++ > 4 files changed, 88 insertions(+), 1 deletion(-) > create mode 100755 t/t6022-rev-list-alternates.sh > > diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt > index a4a0cb93b2..8ee713db3d 100644 > --- a/Documentation/rev-list-options.txt > +++ b/Documentation/rev-list-options.txt > @@ -227,6 +227,15 @@ explicitly. > Upon seeing an invalid object name in the input, pretend as if > the bad input was not given. > > +--ignore-missing-links:: > + During traversal, if an object that is referenced does not > + exist, instead of dying of a repository corruption, pretend as > + if the reference itself does not exist. Running the command > + with the `--boundary` option makes these missing commits, > + together with the commits on the edge of revision ranges > + (i.e. true boundary objects), appear on the output, prefixed > + with '-'. > + > ifndef::git-rev-list[] > --bisect:: > Pretend as if the bad bisection ref `refs/bisect/bad` > diff --git a/builtin/rev-list.c b/builtin/rev-list.c > index ff715d6918..5239d83c76 100644 > --- a/builtin/rev-list.c > +++ b/builtin/rev-list.c > @@ -266,7 +266,8 @@ static int finish_object(struct object *obj, const char *name UNUSED, > { > struct rev_list_info *info = cb_data; > if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) { > - finish_object__ma(obj); > + if (!info->revs->ignore_missing_links) > + finish_object__ma(obj); > return 1; > } > if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT) > diff --git a/t/t6022-rev-list-alternates.sh b/t/t6022-rev-list-alternates.sh > new file mode 100755 > index 0000000000..08d9ffde5f > --- /dev/null > +++ b/t/t6022-rev-list-alternates.sh > @@ -0,0 +1,75 @@ > +#!/bin/sh > + > +test_description='handling of alternates in rev-list' > + > +TEST_PASSES_SANITIZE_LEAK=true > +. ./test-lib.sh > + > +# We create 5 commits and move them to the alt directory and > +# create 5 more commits which will stay in the main odb. > +test_expect_success 'create repository and alternate directory' ' > + git init main && We don't necessarily have to initialize a repository, as the test suite already does so for us. So we may want to write this instead as: test_commit_bulk 5 && git clone --reference=. --shared . alt && test_commit_bulk -C alt --start=6 5 > +# when the alternate odb is provided, all commits are listed along with the boundary > +# commit. > +test_expect_success 'rev-list passes with alternate object directory' ' > + GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual && > + test_stdout_line_count = 10 cat actual && > + grep $BOUNDARY_COMMIT actual > +' Here, I think we'd want to make sure that we have not just 10 lines of output, but that they are the 10 that we expect, like so: git -C alt rev-list --all --objects --no-object-names >actual.raw && { git rev-list --all --objects --no-object-names && git -C alt rev-list --all --objects --no-object-names --not \ --alternate-refs } >expect.raw && sort actual.raw >actual && sort expect.raw >expect && test_cmp expect actual When reviewing this and tweaking some of the tests locally, I found it useful to have some convenience functions like "hide_alternates" and "show_alternates" to control whether or not "alt" could see its alternate or not. >From my review locally, the resulting changes (which can be applied directly on top of your patch here look like): --- 8< --- diff --git a/t/t6022-rev-list-alternates.sh b/t/t6022-rev-list-alternates.sh index 08d9ffde5f..ef4231b2de 100755 --- a/t/t6022-rev-list-alternates.sh +++ b/t/t6022-rev-list-alternates.sh @@ -8,68 +8,86 @@ TEST_PASSES_SANITIZE_LEAK=true # We create 5 commits and move them to the alt directory and # create 5 more commits which will stay in the main odb. test_expect_success 'create repository and alternate directory' ' - git init main && - test_commit_bulk -C main 5 && - BOUNDARY_COMMIT=$(git -C main rev-parse HEAD) && - mkdir alt && - mv main/.git/objects/* alt && - GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_commit_bulk --start=6 -C main 5 + test_commit_bulk 5 && + git clone --reference=. --shared . alt && + test_commit_bulk --start=6 -C alt 5 ' # when the alternate odb is provided, all commits are listed along with the boundary # commit. test_expect_success 'rev-list passes with alternate object directory' ' - GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual && - test_stdout_line_count = 10 cat actual && - grep $BOUNDARY_COMMIT actual + git -C alt rev-list --all --objects --no-object-names >actual.raw && + { + git rev-list --all --objects --no-object-names && + git -C alt rev-list --all --objects --no-object-names --not \ + --alternate-refs + } >expect.raw && + sort actual.raw >actual && + sort expect.raw >expect && + test_cmp expect actual ' +alt=alt/.git/objects/info/alternates + +hide_alternates () { + test -f "$alt.bak" || mv "$alt" "$alt.bak" +} + +show_alternates () { + test -f "$alt" || mv "$alt.bak" "$alt" +} + # When the alternate odb is not provided, rev-list fails since the 5th commit's # parent is not present in the main odb. test_expect_success 'rev-list fails without alternate object directory' ' - test_must_fail git -C main rev-list HEAD + hide_alternates && + test_must_fail git -C alt rev-list HEAD ' # With `--ignore-missing-links`, we stop the traversal when we encounter a # missing link. The boundary commit is not listed as we haven't used the # `--boundary` options. test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' ' - git -C main rev-list --ignore-missing-links HEAD >actual && - test_stdout_line_count = 5 cat actual && - ! grep -$BOUNDARY_COMMIT actual + hide_alternates && + + git -C alt rev-list --objects --no-object-names \ + --ignore-missing-links HEAD >actual.raw && + git -C alt cat-file --batch-check="%(objectname)" \ + --batch-all-objects >expect.raw && + + sort actual.raw >actual && + sort expect.raw >expect && + test_cmp expect actual ' # With `--ignore-missing-links` and `--boundary`, we can even print those boundary # commits. test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' ' - git -C main rev-list --ignore-missing-links --boundary HEAD >actual && - test_stdout_line_count = 6 cat actual && - grep -$BOUNDARY_COMMIT actual + git -C alt rev-list --ignore-missing-links --boundary HEAD >got && + grep "^-$(git rev-parse HEAD)" got ' -# The `--ignore-missing-links` option should ensure that git-rev-list(1) doesn't -# fail when used alongside `--objects` when a tree is missing. -test_expect_success 'rev-list --ignore-missing-links works with missing tree' ' - echo "foo" >main/file && - git -C main add file && - GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 11" && - TREE_OID=$(git -C main rev-parse HEAD^{tree}) && - mkdir alt/${TREE_OID:0:2} && - mv main/.git/objects/${TREE_OID:0:2}/${TREE_OID:2} alt/${TREE_OID:0:2}/ && - git -C main rev-list --ignore-missing-links --objects HEAD >actual && - ! grep $TREE_OID actual +test_expect_success "setup for rev-list --ignore-missing-links with missing objects" ' + show_alternates && + test_commit -C alt 11 ' -# Similar to above, it should also work when a blob is missing. -test_expect_success 'rev-list --ignore-missing-links works with missing blob' ' - echo "bar" >main/file && - git -C main add file && - GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 12" && - BLOB_OID=$(git -C main rev-parse HEAD:file) && - mkdir alt/${BLOB_OID:0:2} && - mv main/.git/objects/${BLOB_OID:0:2}/${BLOB_OID:2} alt/${BLOB_OID:0:2}/ && - git -C main rev-list --ignore-missing-links --objects HEAD >actual && - ! grep $BLOB_OID actual -' +for obj in "HEAD^{tree}" "HEAD:11.t" +do + # The `--ignore-missing-links` option should ensure that git-rev-list(1) + # doesn't fail when used alongside `--objects` when a tree/blob is + # missing. + test_expect_success "rev-list --ignore-missing-links with missing $type" ' + oid="$(git -C alt rev-parse $obj)" && + path="alt/.git/objects/$(test_oid_to_path $oid)" && + + mv "$path" "$path.hidden" && + test_when_finished "mv $path.hidden $path" && + + git -C alt rev-list --ignore-missing-links --objects HEAD \ + >actual && + ! grep $oid actual + ' +done test_done --- >8 --- Thanks, Taylor