Re: [PATCH 3/3] rev-list: add --allow-missing-tips to be used with --missing=...

Christian Couder <christian.couder@xxxxxxxxx> · Fri, 2 Feb 2024 12:29:25 +0100

On Thu, Feb 1, 2024 at 9:20 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Christian Couder <christian.couder@xxxxxxxxx> writes:
>
> > In 9830926c7d (rev-list: add commit object support in `--missing`
> > option, 2023-10-27) we fixed the `--missing` option in `git rev-list`
> > so that it now works with commits too.
> >
> > Unfortunately, such a command would still fail with a "fatal: bad
> > object <oid>" if it is passed a missing commit, blob or tree as an
> > argument.
> >
> > When such a command is used to find the dependencies of some objects,
> > for example the dependencies of quarantined objects, it would be
> > better if the command would instead consider such missing objects,
> > especially commits, in the same way as other missing objects.
> >
> > If, for example `--missing=print` is used, it would be nice for some
> > use cases if the missing tips passed as arguments were reported in
> > the same way as other missing objects instead of the command just
> > failing.
> >
> > Let's introduce a new `--allow-missing-tips` option to make it work
> > like this.
>
> OK.  Unlike a missing object referenced by a tree, a commit, or a
> tag, where the expected type of the missing object is known to Git,
> I would expect that nobody knowsn what type these missing objects at
> the tip have.  So do we now require "object X missing" instead of
> "commit X missing" in the output?  If we are not giving any type
> information even when we know what the type we expect is, then we do
> not have to worry about this change introducing a new output logic,
> I guess.

Yeah, we are not giving type information in the output, only a `?`
before the oid.

> > diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> > index b3f4783858..ae7bb15478 100644
> > --- a/builtin/rev-list.c
> > +++ b/builtin/rev-list.c
> > @@ -562,6 +562,16 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
> >                               break;
> >               }
> >       }
> > +     for (i = 1; i < argc; i++) {
> > +             const char *arg = argv[i];
> > +             if (!strcmp(arg, "--allow-missing-tips")) {
> > +                     if (arg_missing_action == MA_ERROR)
> > +                             die(_("option '%s' only makes sense with '%s' set to '%s' or '%s'"),
> > +                                   "--allow-missing-tips", "--missing=", "allow-*", "print");
> > +                     revs.do_not_die_on_missing_tips = 1;
> > +                     break;
> > +             }
> > +     }
>
> It is unfortunate that we need to add yet another dumb loop that
> does not even try to understand there may be an option whose value
> happens to be the string strcmp() looks for (we already have two
> such loops above this hunk).  I have to wonder if we can do better.

I am also not happy with adding yet another dump loop like this. I did
it because I couldn't think of a simple solution to avoid that.

> An idle piece of idea.  Perhaps we can instruct setup_revisions()
> not to die immediately upon seeing a problematic argument, marking
> the revs as "broken" instead, and keep going and interpreting as
> much as it could, so that it may record the presence of "--missing",
> "--exclude-promisor-objects", and "--allow-missing-tips".  Then upon
> seeing setup_revisions() return with such an error, we can redo the
> call with these bits already on.

When we discussed rejecting some rev walking options before calling
setup_revisions() in the context of `git replay`, one thing people
seemed to agree on was that there should be a mechanism in
setup_revisions() that allows us to tell setup_revisions() which rev
walking options it should allow or not. I think that such a mechanism
might need an early parsing of options which could be reused for
options like `--exclude-promisor-objects`, `--missing=...`  and
`--allow-missing-tips`. But I think adding such a mechanism does not
belong to this patch series. It's some cleanup and refactoring that we
might have needed for a long time already.

In my current version, I have added the following NEEDSWORK comment
before the three dump loops to make sure we remember about this:

    /*
     * NEEDSWORK: These dump loops to look for some options early
     * are ugly. We really need setup_revisions() to have a
     * mechanism to allow and disallow some sets of options for
     * different commands (like rev-list, replay, etc). Such
     * mechanism should do an early parsing of option and be able
     * to manage the `--exclude-promisor-objects`, `--missing=...`
     * and `--allow-missing-tips` options below.
     */

I have also added the following to the commit message:

"We introduce another dump loop to look for that options early, but
addressing the dump loops should likely be part of adding a new
mechanism to setup_revisions() which is a different topic. Anyway
let's add a big NEEDSWORK comment to remember that we really need to
do this."

> Anyway, I digress.
>
> > diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh
> > index 527aa94f07..283e8fc2c2 100755
> > --- a/t/t6022-rev-list-missing.sh
> > +++ b/t/t6022-rev-list-missing.sh
> > @@ -77,4 +77,55 @@ do
> >       done
> >  done
> >
> > +for obj in "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
> > +do
> > +     for tip in "" "HEAD"
> > +     do
> > +             for action in "allow-any" "print"
> > +             do
> > +                     test_expect_success "--missing=$action --allow-missing-tips with tip '$obj' missing and tip '$tip'" '
> > +                             oid="$(git rev-parse $obj)" &&
> > +                             path=".git/objects/$(test_oid_to_path $oid)" &&
> > +
> > +                             # Before the object is made missing, we use rev-list to
> > +                             # get the expected oids.
> > +                             if [ "$tip" = "HEAD" ]; then
>
> Style:
>
>                                 if test "$tip" = "HEAD"
>                                 then

Fixed in my current version. In it, I also fixed in patch 2/3 a
similar style typo in the tests before these ones.

> > +                                     git rev-list --objects --no-object-names \
> > +                                             HEAD ^$obj >expect.raw
> > +                             else
> > +                                     >expect.raw
> > +                             fi &&
> > +
> > +                             # Blobs are shared by all commits, so even though a commit/tree
> > +                             # might be skipped, its blob must be accounted for.
> > +                             if [ "$tip" = "HEAD" ] && [ $obj != "HEAD:1.t" ]; then
>
> Ditto.

Fixed too in my current version.

> > +                                     echo $(git rev-parse HEAD:1.t) >>expect.raw &&
> > +                                     echo $(git rev-parse HEAD:2.t) >>expect.raw
> > +                             fi &&
> > +
> > +                             mv "$path" "$path.hidden" &&
> > +                             test_when_finished "mv $path.hidden $path" &&
> > +
> > +                             git rev-list --missing=$action --allow-missing-tips \
> > +                                  --objects --no-object-names $oid $tip >actual.raw &&
> > +
> > +                             # When the action is to print, we should also add the missing
> > +                             # oid to the expect list.
> > +                             case $action in
> > +                             allow-any)
> > +                                     ;;
> > +                             print)
> > +                                     grep ?$oid actual.raw &&
> > +                                     echo ?$oid >>expect.raw
> > +                                     ;;
>
> OK.  We do not say anything more than the object name (and the fact
> that it is missing with a single byte '?'), so my earlier worry was
> unfounded.  Good.
>
> > +                             esac &&
> > +
> > +                             sort actual.raw >actual &&
> > +                             sort expect.raw >expect &&
> > +                             test_cmp expect actual
> > +                     '
> > +             done
> > +     done
> > +done
> > +
> >  test_done
>
> THanks.

Thanks for the review!