Re: [BUG?] ls-files -o now traverses nested repo when given multiple pathspecs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Kyle,

Thanks for the clear report and pointing out relevant commit(s) and
discussion.  Apologies in advance if I come off a bit ranty; it's
directed to dir.c and its API and not you.  It seems to be quite the
mess...

On Tue, Dec 3, 2019 at 2:08 PM Kyle Meyer <kyle@xxxxxxxxxx> wrote:
>
> 89a1f4aaf7 (dir: if our pathspec might match files under a dir, recurse
> into it, 2019-09-17) introduced a change in behavior in terms of
> traversing untracked nested repositories.  Say we have a repository that
> contains a single untracked repository with untracked content:
>
>     $ git init && git init a && touch a/x
>
> Calling ls-files with the nested repository as the sole pathspec does
> not recurse into that repository:
>
>     $ git ls-files --other a
>     a/
>
> However, as of 89a1f4aaf7, adding an additional pathspec results in the
> nested repository being traversed:
>
>     $ git ls-files --other a foo
>     a/
>     a/x
>
> Reading 89a1f4aaf7 and skimming the patch series and related thread [*],
> I haven't found anything that makes me think this change in behavior was
> intentional.

Oh man, I guess I shouldn't be surprised by another area of the code
that depends on dir.c but lacks any meaningful tests of its behavior,
violated the existing contract[1], and depends on the side effects of
other bugs -- bugs which don't cover all cases and thus causes it to
get different behavior depending on things that otherwise shouldn't
matter.  Behold, *before* my changes to dir.c:

$ git --version
2.23.0
$ find . -type f | grep -v .git
./empty
./untracked_repo/empty
./untracked_dir/empty
./world
$ git ls-files
world
$ git ls-files -o
empty
untracked_dir/empty
untracked_repo/

So, as you say, it wouldn't traverse into untracked_repo/.  Now we
start adding pathspecs:

$ git ls-files -o untracked_repo
untracked_repo/

As you mentioned, it won't traverse into it even when specified...

$ git ls-files -o untracked_repo/
untracked_repo/empty

...except that it does traverse into this directory if the user tab
completes the name or otherwise manually adds a trailing slash.
Weird, let's try multiple pathspecs:

$ git ls-files -o untracked_dir untracked_repo
untracked_dir/empty
untracked_repo/

$ git ls-files -o untracked_dir untracked_repo/
untracked_dir/empty
untracked_repo/

So it will traverse into the untracked_repo when specified as
'untracked_repo/' but not if there are more than one pathspec given?!?
 And it traverses into an untracked directory regardless of the
trailing slash?  <sarcasm>What a paragon of consistency...</sarcasm>


At least my changes in git-2.24.0 made the behavior consistent; it'll
always traverse into a directory that matches a given pathspec.  As
for whether that's desirable or not when the pathspec is a submodule,
I'm not certain.  My fixes to dir.c stalled out for over a year and a
half despite a few reports that they had fixed issues people were
continuing to report in the wild because the whole traversal logic was
such a mess and there were so many dependencies on existing behavior
built up with that mess that it was really hard to determine what
"correct" behavior was and others seemed to be unwilling to wade
through the muck to figure out where sanity might lie so they could
help give me pointers or opinions on what "correct" was[2].

But here are some possibilities that at least sound sane:

A) ls-files -o should traverse into untracked submodules.  This case
is easy; the code already does that.

B) ls-files -o should NOT traverse into untracked submodules AND
should not even report them.  If so, fix looks like this:

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index f069a028ce..f144d44d8b 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -301,6 +301,9 @@ static void show_files(struct repository *repo,
struct dir_struct *dir)
        int i;
        struct strbuf fullname = STRBUF_INIT;

+       if (!recurse_submodules)
+               dir->flags |= DIR_SKIP_NESTED_GIT;
+
        /* For cached/deleted files we don't need to even do the readdir */
        if (show_others || show_killed) {
                if (!show_others)

C) ls-files -o should NOT traverse into untracked submodules, but
should at least report their directory name.  If so, the fix is
probably to move the DIR_NO_GITLINKS if-block within
dir.c:treat_directory() to put it right next to the
DIR_SKIP_NESTED_GIT if-block (and maybe even partially combine the
two) so that it comes before some of the other code in that function.
Might have to be careful about checking for the presence of trailing
slashes.


It seems like it should be clear which one of these is "correct", but
I seem to be short a few brain cycles right now...either that, or
maybe my very rare use of submodules and nested repositories means I
just don't have enough context to answer.  Maybe it's obvious to
someone else...


Elijah

[1] https://lore.kernel.org/git/xmqqefjp6sko.fsf@xxxxxxxxxxxxxxxxxxxxxxxxx/
[2] https://lore.kernel.org/git/20190905154735.29784-1-newren@xxxxxxxxx/



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux