On Fri, Sep 8, 2023 at 9:19 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > The above description needs tightened up a bit, I think. > > What is left unsaid is that you arranged a repository to borrow from > an alternate object directory (or two), and plan to walk objects > with this bit on in the repository, while leaving the alternates > disabled. Without stating that you plan to disable the alternates > while this mode of operation happens, nothing would happen when the > traversal goes from the main to the alternate because no links are > broken, no? > Fair enough, I agree with your points. I'll amend the message to highlight this scenario. > > By exposing this new flag `--ignore-missing-links`, users can set the > > required env variables (GIT_OBJECT_DIRECTORY and > > GIT_ALTERNATE_OBJECT_DIRECTORIES) along with the `--boundary` flag to > > find the boundary objects between object directories. > > This command being a plumbing, there is not much reason to object to > surfacing features that already internally exist to the command line > option. Having said that, > > * Suppose your traversal with --ignore-missing-links from the tip > of a branch reaches a tree object A, and the tree object A has a > link to a blob B and a blob C. But B is in a separate object > store that you usually access via the alternate mechanism. > Instead of barfing "The repository is corrupt---object A points > at object B that does not exist", we pretend that A does not have > the link to B and keep traversing, discovering C and other > objects. > > That much we can read from the above and also the documentation > part of the patch. The interaction with --boundary needs to be > clarified in this description and the documentation, though. It > is unclear if you show 'A' or 'B' in this scenario. Do note that the `--boundary` option only works with commits. Keeping this in mind `--ignore-missing-links` when used with `--boundary` doesn't even traverse non-commit objects. Which means trees/blobs being corrupted shouldn't matter. But I did realize that `--ignore-missing-links` as this patch stands is broken when used alongside the `--objects` flag (`--boundary` doesn't work with `--objects` at the moment, this is something I plan to tackle soon after with a `--boundary-objects` flag). The second version of this patch will have a fix to ensure that even non-commit objects are ignored during traversal if `--objects` option is used. > > * Some traversals use the ignore-missing-links bit implicitly and > currently there is no way to turn it off. Is it plausible that > user may want to explicitly toggle it off, with the option > negated, i.e. --no-ignore-missing-links? I do not immediately > see the utility of such an option, but that is only due to my > lack of imagination. For now, I think it makes sense not to > allow negating this option, until somebody comes up with a useful > use case. > Agreed! > > +--ignore-missing-links:: > > + When an object points to another object that is missing, pretend as if the > > + link did not exist. These missing links are not written to stdout unless > > + the --boundary flag is passed. > > Does "git rev-list" ever writes "links"? I thought not. > > "These missing objects are not written" would be more sensible, but > we never write missing objects with or without the option, so it > is not even worth saying. > > When "--boundary" is passed, do they appear as if they are > available? If not, then the above description is very misleading. > > During traversal, if an object that is referenced does not > exist, pretend as if the reference itself does not exist, > instead of dying of a repository corruption. Running the > command with the "--boundary" option makes these missing > objects, together with the objects on the edge of revision > ranges (i.e. true boundary objects), appear on the output, > prefixed with '-'. > > or something like that, perhaps? > This indeed is better, I've copied and modified it as needed. > > +# With `--ignore-missing-links`, we stop the traversal when we encounter a > > +# missing link. > > +test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' ' > > + test_stdout_line_count = 5 git -C main rev-list --ignore-missing-links HEAD > > +' > > + > > +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary > > +# commits. > > +test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' ' > > + git -C main rev-list --ignore-missing-links --boundary HEAD >list-output && > > + test_stdout_line_count = 6 cat list-output && > > + test_stdout_line_count = 1 cat list-output | grep "^-" > > +' > > These tests are way too loose. Not only you want to see certain > number of boundary objects, you _know_ exactly which object should > be on the boundary, and you should check that instead. That will > allow you to find a mistake to write commit 'A' that refers to a > missing commit 'B', when they wanted to write the missing comit 'B', > as a boundary object, for example. > Fair enough, I will make them more specific and add some tests for missing trees/blobs. > Thanks. Thank you for the review. Will send the next version of the patch soon :)