Re: [PATCH v4] revision: add `--ignore-missing-links` user option

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 20 Sep 2023 08:32:07 -0700

> diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> index a4a0cb93b2..8ee713db3d 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -227,6 +227,15 @@ explicitly.
>  	Upon seeing an invalid object name in the input, pretend as if
>  	the bad input was not given.
>  
> +--ignore-missing-links::
> +	During traversal, if an object that is referenced does not
> +	exist, instead of dying of a repository corruption, pretend as
> +	if the reference itself does not exist. Running the command
> +	with the `--boundary` option makes these missing commits,
> +	together with the commits on the edge of revision ranges
> +	(i.e. true boundary objects), appear on the output, prefixed
> +	with '-'.

There needs an explanation of interaction with --missing=<action>
option here, no?  "--missing=allow-any" and "--missing=print" are
sensible choices, I presume.  The former allows the traversal to
proceed, as you described in one of your responses.  Also with
"--missing=print", the user can more directly find out which are the
missing objects, even without using the "--boundary" that requires
them to sift between missing objects and the objects that are truly
on boundary.

Here is my attempt:

        --ignore-missing-links::
                During traversal, if an object that is referenced does not
                exist, instead of dying of a repository corruption, allow
                `--missing=<missing-action>` to decide what to do.
        +
        `--missing=print` will make the command print a list of missing
        objects, prefixed with a "?" character.
        +
        `--missing=allow-any` will make the command proceed without doing
        anything special.  Used with `--boundary`, output these missing
        objects mixed with the commits on the edge of revision ranges,
        prefixed with a "-" character.

It might make sense to add

        +
        Use of this option with other 'missing-action' may probably not
        give useful behaviour.

at the end, but it may not be useful to the readers to say "we allow
even more extra flexibility but haven't thought through what good
they would do".

> +# With `--ignore-missing-links`, we stop the traversal when we encounter a
> +# missing link. The boundary commit is not listed as we haven't used the
> +# `--boundary` options.
> +test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
> +	hide_alternates &&
> +
> +	git -C alt rev-list --objects --no-object-names \
> +		--ignore-missing-links --missing=allow-any HEAD >actual.raw &&
> +	git -C alt cat-file  --batch-check="%(objectname)" \
> +		--batch-all-objects >expect.raw &&
> +
> +	sort actual.raw >actual &&
> +	sort expect.raw >expect &&
> +	test_cmp expect actual
> +'

This gives a good baseline.  "--missing=print" without "--boundary"
may have more obvious use cases, but is there a practical use case
for the output from an invocation with "--missing=allow-any" without
"--boundary"?  Just being curious if I am missing something obvious.

Perhaps add another test that uses "--missing=print" instead, and
check that the "? missing" output matches what we expect to be
missing?  The same comment applies to the other test that uses
"--missing=allow-any" without "--boundary" we see later.

> +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
> +# commits.
> +test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
> +	git -C alt rev-list --ignore-missing-links --boundary HEAD >got &&
> +	grep "^-$(git rev-parse HEAD)" got
> +'

This makes sure what we expect to appear in 'got' actually is in
'got', but we should also make sure 'got' does not have anything
unexpected.  

> +test_expect_success "setup for rev-list --ignore-missing-links with missing objects" '
> +	show_alternates &&
> +	test_commit -C alt 11
> +'
> +
> +for obj in "HEAD^{tree}" "HEAD:11.t"
> +do
> +	# The `--ignore-missing-links` option should ensure that git-rev-list(1)
> +	# doesn't fail when used alongside `--objects` when a tree/blob is
> +	# missing.
> +	test_expect_success "rev-list --ignore-missing-links with missing $type" '
> +		oid="$(git -C alt rev-parse $obj)" &&
> +		path="alt/.git/objects/$(test_oid_to_path $oid)" &&
> +
> +		mv "$path" "$path.hidden" &&
> +		test_when_finished "mv $path.hidden $path" &&

In the first iteration, we check without the tree object and we only
ensure that removed tree does not appear in the output---but we know
the blob that is referenced by that removed tree will not appear in
the output, either, don't we?  Don't we want to check that, too?

In the second iteration, we have resurrected the tree but removed
the blob that is referenced by the tree, so we would not see that
blob in the output, which makes sense.

> +		git -C alt rev-list --ignore-missing-links --missing=allow-any --objects HEAD \
> +			>actual &&
> +		! grep $oid actual
> +       '
> +done
> +
> +test_done

Thanks.