Re: [PATCH] revision: add `--ignore-missing-links` user option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 8, 2023 at 9:19 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
> The above description needs tightened up a bit, I think.
>
> What is left unsaid is that you arranged a repository to borrow from
> an alternate object directory (or two), and plan to walk objects
> with this bit on in the repository, while leaving the alternates
> disabled.  Without stating that you plan to disable the alternates
> while this mode of operation happens, nothing would happen when the
> traversal goes from the main to the alternate because no links are
> broken, no?
>

Fair enough, I agree with your points. I'll amend the message to highlight this
scenario.

> > By exposing this new flag `--ignore-missing-links`, users can set the
> > required env variables (GIT_OBJECT_DIRECTORY and
> > GIT_ALTERNATE_OBJECT_DIRECTORIES) along with the `--boundary` flag to
> > find the boundary objects between object directories.
>
> This command being a plumbing, there is not much reason to object to
> surfacing features that already internally exist to the command line
> option.    Having said that,
>
>  * Suppose your traversal with --ignore-missing-links from the tip
>    of a branch reaches a tree object A, and the tree object A has a
>    link to a blob B and a blob C.  But B is in a separate object
>    store that you usually access via the alternate mechanism.
>    Instead of barfing "The repository is corrupt---object A points
>    at object B that does not exist", we pretend that A does not have
>    the link to B and keep traversing, discovering C and other
>    objects.
>
>    That much we can read from the above and also the documentation
>    part of the patch.  The interaction with --boundary needs to be
>    clarified in this description and the documentation, though.  It
>    is unclear if you show 'A' or 'B' in this scenario.

Do note that the `--boundary` option only works with commits. Keeping this in
mind `--ignore-missing-links` when used with `--boundary` doesn't even traverse
non-commit objects. Which means trees/blobs being corrupted shouldn't matter.

But I did realize that `--ignore-missing-links` as this patch stands
is broken when
used alongside the `--objects` flag (`--boundary` doesn't work with
`--objects` at the
moment, this is something I plan to tackle soon after with a
`--boundary-objects` flag).
The second version of this patch will have a fix to ensure that even
non-commit objects
are ignored during traversal if `--objects` option is used.

>
>  * Some traversals use the ignore-missing-links bit implicitly and
>    currently there is no way to turn it off.  Is it plausible that
>    user may want to explicitly toggle it off, with the option
>    negated, i.e. --no-ignore-missing-links?  I do not immediately
>    see the utility of such an option, but that is only due to my
>    lack of imagination.  For now, I think it makes sense not to
>    allow negating this option, until somebody comes up with a useful
>    use case.
>

Agreed!

> > +--ignore-missing-links::
> > +     When an object points to another object that is missing, pretend as if the
> > +     link did not exist. These missing links are not written to stdout unless
> > +     the --boundary flag is passed.
>
> Does "git rev-list" ever writes "links"?  I thought not.
>
> "These missing objects are not written" would be more sensible, but
> we never write missing objects with or without the option, so it
> is not even worth saying.
>
> When "--boundary" is passed, do they appear as if they are
> available?  If not, then the above description is very misleading.
>
>     During traversal, if an object that is referenced does not
>     exist, pretend as if the reference itself does not exist,
>     instead of dying of a repository corruption.  Running the
>     command with the "--boundary" option makes these missing
>     objects, together with the objects on the edge of revision
>     ranges (i.e. true boundary objects), appear on the output,
>     prefixed with '-'.
>
> or something like that, perhaps?
>

This indeed is better, I've copied and modified it as needed.

> > +# With `--ignore-missing-links`, we stop the traversal when we encounter a
> > +# missing link.
> > +test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
> > +     test_stdout_line_count = 5 git -C main rev-list --ignore-missing-links HEAD
> > +'
> > +
> > +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
> > +# commits.
> > +test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
> > +     git -C main rev-list --ignore-missing-links --boundary HEAD >list-output &&
> > +     test_stdout_line_count = 6 cat list-output &&
> > +     test_stdout_line_count = 1 cat list-output | grep "^-"
> > +'
>
> These tests are way too loose.  Not only you want to see certain
> number of boundary objects, you _know_ exactly which object should
> be on the boundary, and you should check that instead.  That will
> allow you to find a mistake to write commit 'A' that refers to a
> missing commit 'B', when they wanted to write the missing comit 'B',
> as a boundary object, for example.
>

Fair enough, I will make them more specific and add some tests for
missing trees/blobs.

> Thanks.

Thank you for the review. Will send the next version of the patch soon :)




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux