Re: [PATCH 3/3] rev-list: add --allow-missing-tips to be used with --missing=...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 7, 2024 at 10:40 AM Linus Arver <linusa@xxxxxxxxxx> wrote:
>
>
> Christian Couder <christian.couder@xxxxxxxxx> writes:
>
> > In 9830926c7d (rev-list: add commit object support in `--missing`
> > option, 2023-10-27) we fixed the `--missing` option in `git rev-list`
> > so that it now works with commits too.
> >
> > Unfortunately, such a command would still fail with a "fatal: bad
> > object <oid>" if it is passed a missing commit, blob or tree as an
> > argument.
>
> Hmm, but according to the manpage for rev-list, it only accepts commits
> as arguments. Conflict...?

It actually kind of "works" when you pass blobs or trees. It looks
like the command just doesn't use them (except for checking that they
actually exist) unless options like --objects, --missing=print and
perhaps others are used. So yeah, the doc might need an update, but I
think it could be a separate issue, as it's not new and doesn't depend
on this small series.

> Also it took me a second to realize that you are talking about
> missing objects passed in at the command line to `git rev-list` and not
> the objects that this command encounters during the normal rev walking
> process. It would have been a touch nicer to say something more
> explicit, like
>
>     In 9830926c7d (rev-list: add commit object support in `--missing`
>     option, 2023-10-27) we fixed the `--missing` option in `git rev-list`
>     so that it works with missing commits, not just blobs/trees.
>
>     Unfortunately, it would still fail with a "fatal: bad
>     object <oid>" if it is passed a missing commit as an
>     argument (before the rev walking even begins).

Thanks, I have used your suggestions in my current version, except
that I still use "passed a missing commit, blob or tree" instead of
"passed a missing commit" as the command also accepts blobs and trees.

> > When such a command is used to find the dependencies of some objects,
> > for example the dependencies of quarantined objects, it would be
> > better if the command would instead consider such missing objects,
> > especially commits, in the same way as other missing objects.
>
> Could you define what quarantined objects are (it's not in the manpage
> for rev-list)?

"quarantined objects" refers to the following doc:

https://www.git-scm.com/docs/git-receive-pack#_quarantine_environment

So to clarify things, the above paragraph looks like the following now:

"When such a command is used to find the dependencies of some objects,
for example the dependencies of quarantined objects (see the
"QUARANTINE ENVIRONMENT" section in the git-receive-pack(1)
documentation), it would be better if the command would instead
consider such missing objects, especially commits, in the same way as
other missing objects."

> But also, I'm not sure how much additional value this
> paragraph is adding on top of what you already said in the first two
> paragraphs.

It's a more specific example, and it's how we detected the issue we
would like to address, so I think it has some value. I am Ok with
removing it, if others don't see some value in it though.

> > If, for example `--missing=print` is used, it would be nice for some
> > use cases if the missing tips passed as arguments were reported in
> > the same way as other missing objects instead of the command just
> > failing.
> >
> > Let's introduce a new `--allow-missing-tips` option to make it work
> > like this.
>
> I assume "tips" means "the starting commits which are passed into this
> command from which it begins the rev walk". Might be worth including in
> the commit message.

I think "tips" is better than "commits" because blobs and trees are
allowed (see above).

> But also, it's curious that you chose the word "allow" when we already
> have "--ignore-missing". I assume this is because you still want the
> missing tips to still be displayed in the output, and not ignored
> (omitted from processing altogether). Perhaps "--report-missing-tips"
> is more explicit?

I think "allow" is better as it makes it clearer that the command will
not fail while it should fail without the option.

Anyway I think I will implement Junio's suggestion to just avoid
adding a new option and just change the behavior of the command when
--missing=[print|allow*] is used.

> > @@ -753,9 +765,19 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
> >
> >       if (arg_print_omitted)
> >               oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
> > -     if (arg_missing_action == MA_PRINT)
> > +     if (arg_missing_action == MA_PRINT) {
> > +             struct oidset_iter iter;
> > +             struct object_id *oid;
> > +
> >               oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
> >
> > +             /* Already add missing commits */
>
> Did you mean "Add already-missing commits"? Perhaps even more explicit
> would be "Add missing tips"?

Yeah, right. I have changed it in my current version.

> > +             oidset_iter_init(&revs.missing_commits, &iter);
> > +             while ((oid = oidset_iter_next(&iter)))
> > +                     oidset_insert(&missing_objects, oid);
> > +             oidset_clear(&revs.missing_commits);
> > +     }
>
> Aside: I am surprised that there is no helper function already that
> simply copies things in one oidset into another oidset.

Yeah, I was surprised too. I only found the following similar code in
list-objects-filter.c:

    oidset_iter_init(src, &iter);
    while ((src_oid = oidset_iter_next(&iter)) != NULL)
        oidset_insert(dest, src_oid);

So yeah perhaps we could create such a helper function.

> >       traverse_commit_list_filtered(
> >               &revs, show_commit, show_object, &info,
> >               (arg_print_omitted ? &omitted_objects : NULL));
> > diff --git a/revision.c b/revision.c
> > index 4c5cd7c3ce..9f25faa249 100644
> > --- a/revision.c
> > +++ b/revision.c
> >
> > [...]
> >
> > @@ -2184,7 +2189,7 @@ static int handle_revision_arg_1(const char *arg_, struct rev_info *revs, int fl
> >               verify_non_filename(revs->prefix, arg);
> >       object = get_reference(revs, arg, &oid, flags ^ local_flags);
> >       if (!object)
> > -             return revs->ignore_missing ? 0 : -1;
> > +             return (revs->ignore_missing || revs->do_not_die_on_missing_tips) ? 0 : -1;
> >       add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags);
> >       add_pending_object_with_path(revs, object, arg, oc.mode, oc.path);
> >       free(oc.path);
>
> With a few more context lines, the diff looks like
>
> --8<---------------cut here---------------start------------->8---
>         if (get_oid_with_context(revs->repo, arg, get_sha1_flags, &oid, &oc))
>                 return revs->ignore_missing ? 0 : -1;
>         if (!cant_be_filename)
>                 verify_non_filename(revs->prefix, arg);
>         object = get_reference(revs, arg, &oid, flags ^ local_flags);
>         if (!object)
> -               return revs->ignore_missing ? 0 : -1;
> +               return (revs->ignore_missing || revs->do_not_die_on_missing_tips) ? 0 : -1;
>         add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags);
>         add_pending_object_with_path(revs, object, arg, oc.mode, oc.path);
>         free(oc.path);
>         return 0;
> --8<---------------cut here---------------end--------------->8---
>
> and it's not obvious to me why you only touched the "if (!object)" case
> but not the "if (get_oid_with_context(...))" case. Are there any
> subtleties here that would not be obvious to reviewers?

I thought that if we can't get an oid, we cannot put anything in the
'missing_commit' oidset anyway, so it might be better to error out.

> > @@ -3830,8 +3835,6 @@ int prepare_revision_walk(struct rev_info *revs)
> >                                      FOR_EACH_OBJECT_PROMISOR_ONLY);
> >       }
> >
> > -     oidset_init(&revs->missing_commits, 0);
> > -
> >       if (!revs->reflog_info)
> >               prepare_to_use_bloom_filter(revs);
> >       if (!revs->unsorted_input)
> > diff --git a/revision.h b/revision.h
> > index 94c43138bc..67435a5d8a 100644
> > --- a/revision.h
> > +++ b/revision.h
> > @@ -227,6 +227,14 @@ struct rev_info {
> >                        */
> >                       do_not_die_on_missing_objects:1,
> >
> > +                     /*
> > +                      * When the do_not_die_on_missing_objects flag above is set,
> > +                      * a rev walk could still die with "fatal: bad object <oid>"
> > +                      * if one of the tips it is passed is missing. With this flag
> > +                      * such a tip will be reported as missing too.
> > +                      */
> > +                      do_not_die_on_missing_tips:1,
> > +
> >                       /* for internal use only */
> >                       exclude_promisor_objects:1;
>
> IIUC, we won't get to even do the rev walk though if all tips are
> missing. So perhaps a more precise comment would be
>
>     /*
>      * When the do_not_die_on_missing_objects flag above is set,
>      * we could prematurely die with "fatal: bad object <oid>"
>      * if one of the tips it is passed is missing before even getting to
>      * the rev walk. With this flag such a tip will be reported as
>      * missing in the output after the rev walk completes.
>      */

Thanks for the suggestion, I have now:

            /*
             * When the do_not_die_on_missing_objects flag above is set,
             * we could prematurely die with "fatal: bad object <oid>"
             * if one of the tips passed to a rev walk is missing, before
             * the rev walk even starts. With this flag such a tip will
             * be reported as missing in the output after the rev walk
             * completes.
             */
             do_not_die_on_missing_tips:1,


> > diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh
> > index 527aa94f07..283e8fc2c2 100755
> > --- a/t/t6022-rev-list-missing.sh
> > +++ b/t/t6022-rev-list-missing.sh
> > @@ -77,4 +77,55 @@ do
> >       done
> >  done
> >
> > +for obj in "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
> > +do
> > +     for tip in "" "HEAD"
>
> So far from the patch diff it was not obvious that you wanted to check
> for the empty string as a "tip".

We want to check that things work as expected both:

  - when we pass only one missing tip, and:
  - when we pass one missing tip and a tip that is not missing.

We are setting 'tip' to "" in the former case so that when we will use
"$oid $tip" in the `git rev-list --missing=$action
--allow-missing-tips ...` command below, it will be like we passed
only "$oid" to the command.

> > +     do
> > +             for action in "allow-any" "print"
> > +             do
> > +                     test_expect_success "--missing=$action --allow-missing-tips with tip '$obj' missing and tip '$tip'" '
>
> If I run this test without the new --allow-missing-tips flag, I get
>
>     fatal: bad object 82335b23aa7234320d6f8055243c852e4b6a5ca3
>     not ok 11 - --missing=allow-any --allow-missing-tips with tip 'HEAD~1' missing and tip ''
>
> and I think the last "tip ''" part is misleading because IIUC it's not
> actually passed in as a tip (see my comment below about shell quoting).
>
> > +                             oid="$(git rev-parse $obj)" &&
> > +                             path=".git/objects/$(test_oid_to_path $oid)" &&
> > +
> > +                             # Before the object is made missing, we use rev-list to
> > +                             # get the expected oids.
> > +                             if [ "$tip" = "HEAD" ]; then
> > +                                     git rev-list --objects --no-object-names \
> > +                                             HEAD ^$obj >expect.raw
> > +                             else
> > +                                     >expect.raw
> > +                             fi &&
>
> OK so you are using this empty string to clear the expect.raw file. If
> that's true then what stops us from doing this at the start of every
> iteration?

I am not sure what you mean. Both:

    git rev-list --objects --no-object-names HEAD ^$obj >expect.raw

and:

    >expect.raw #2

are clearing "expect.raw" as ">expect.raw" is used in both cases.

If we did the latter at the start of every iteration, then when we do
the former afterwards, we would clear "expect.raw" raw again, while
there is no point in clearing it twice.

> > +                             # Blobs are shared by all commits, so even though a commit/tree
> > +                             # might be skipped, its blob must be accounted for.
> > +                             if [ "$tip" = "HEAD" ] && [ $obj != "HEAD:1.t" ]; then
> > +                                     echo $(git rev-parse HEAD:1.t) >>expect.raw &&
> > +                                     echo $(git rev-parse HEAD:2.t) >>expect.raw
> > +                             fi &&
> > +
> > +                             mv "$path" "$path.hidden" &&
> > +                             test_when_finished "mv $path.hidden $path" &&
> > +
> > +                             git rev-list --missing=$action --allow-missing-tips \
> > +                                  --objects --no-object-names $oid $tip >actual.raw &&
>
> The fact that $tip is not quoted here means that this is equivalent to
>
>     --objects --no-object-names $oid >actual.raw &&
>
> and not
>
>     --objects --no-object-names $oid "" >actual.raw &&

Yeah, right, and that's what we want.

> for the case where tip is the empty string. I wonder if we could avoid
> being nuanced here with subtle shell quoting rules, especially because
> these are tests (where making everything as explicit as possible is
> desirable).

I think it's not very subtle, but perhaps a comment would help. So I
added the following to my current version before the loop:

    # We want to check that things work when both
    #   - all the tips passed are missing (case tip = ""), and
    #   - there is one missing tip and one existing tip (case tip = "HEAD")

> > +                             # When the action is to print, we should also add the missing
> > +                             # oid to the expect list.
> > +                             case $action in
> > +                             allow-any)
> > +                                     ;;
> > +                             print)
> > +                                     grep ?$oid actual.raw &&
> > +                                     echo ?$oid >>expect.raw
> > +                                     ;;
> > +                             esac &&
> > +
> > +                             sort actual.raw >actual &&
> > +                             sort expect.raw >expect &&
> > +                             test_cmp expect actual
> > +                     '
> > +             done
> > +     done
> > +done
> > +
>
> Wait, but where are we actually making the $tip commit's object
> _missing_? Hmm...
>
> Ah, actually the tips are just the $obj variables. So it seems like you
> could have done
>
>     for tip in "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
>
> in the beginning to make it more obvious.

The previous tests in this test script used "oid" as the variable name
for possibly missing objects, so I did the same in the tests I added.
I could have used "tip" and "extra_tip" instead of "oid" and "tip" or
perhaps "oid" and "extra_oid", but I don't think it makes a big
difference in understanding these tests.

> Also, given how most of this is identical from the loop already in t6022
> just above it, it would help to add a comment at the top of this one
> explaining how it's different than the one above it.

The one I added has an extra `for tip in "" "HEAD"` loop. Anyway I
think I will just modify the existing test in the next version I will
send, as I plan to implement Junio's suggestion and there will be no
new option.

Thanks for the review!





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux