Re: [PATCH] ls-files: fix overeager pathspec optimization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> I have this memory that _used_ to have a per-filename flag in "git add" 
> that checked if that particular filename component was used or not. But 
> now it just looks at 'dir->ignored_nr' and 'dir->ignored[]'.

Yes, and the previous patch wasn't adding what is ignored to the array, so
here is a re-roll to fix that in addition to the fix to "should the loop
start from checking an empty path?" issue you noticed.

But I am starting to wonder if we might be better off restructuring
read_directory_recursive().  Currently it assumes that the path it was
given _must_ be of interest (i.e. not ignored) and runs excluded() on
subdirectories it finds to make that same decision before recursing into
them or skipping them.  It might make more sense if it first checked if
the path given by the caller should be ignored and act accordingly. If it
is to be ignored, perhaps it will simply return without doing anything
(normal case), or it will return but adds the path to ignored[]
(DIR_COLLECT_IGNORED case), or it will recurse into itself but tell it
that everything it finds is to be ignored.  I dunno...

-- >8 --
Subject: [PATCH (v3)] ls-files: fix overeager pathspec optimization

Given pathspecs that share a common prefix, ls-files optimized its call
into recursive directory reader by starting at the common prefix
directory.

If you have a directory "t" with an untracked file "t/junk" in it, but the
top-level .gitignore file told us to ignore "t/", this resulted in:

    $ git ls-files -o --exclude-standard
    $ git ls-files -o --exclude-standard t/
    t/junk
    $ git ls-files -o --exclude-standard t/junk
    t/junk
    $ cd t && git ls-files -o --exclude-standard
    junk

We could argue that you are overriding the ignore file by giving a
patchspec that matches or being in that directory, but it is somewhat
unexpected.  Worse yet, these behave differently:

    $ git ls-files -o --exclude-standard t/ .
    $ git ls-files -o --exclude-standard t/
    t/junk

This patch changes the optimization so that it notices when the common
prefix directory that it starts reading from is an ignored one.

Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
---
 dir.c                              |   36 +++++++++++++++++++++++++++++++++
 t/t3001-ls-files-others-exclude.sh |   39 ++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+), 0 deletions(-)

diff --git a/dir.c b/dir.c
index d0999ba..e8be909 100644
--- a/dir.c
+++ b/dir.c
@@ -778,6 +778,39 @@ static void free_simplify(struct path_simplify *simplify)
 	free(simplify);
 }
 
+static int has_leading_ignored_dir(struct dir_struct *dir,
+				   const char *path_, int len,
+				   const struct path_simplify *simplify)
+{
+	int dtype = DT_DIR;
+	char path[PATH_MAX], *cp = path;
+
+	memcpy(path, path_, len);
+	while (1) {
+		char *next = memchr(cp, '/', path + len - cp);
+		int exclude;
+
+		/*
+		 * NOTE! NOTE! NOTE!: we might want to actually lstat(2)
+		 * path[] to make sure it is a directory.
+		 */
+		if (next)
+			*next = '\0';
+		exclude = excluded(dir, path, &dtype);
+		if (next)
+			*next = '/';
+		if (exclude) {
+			if (dir->flags & DIR_COLLECT_IGNORED)
+				dir_add_ignored(dir, path, len);
+			return 1;
+		}
+		if (!next)
+			break;
+		cp = next + 1;
+	}
+	return 0;
+}
+
 int read_directory(struct dir_struct *dir, const char *path, int len, const char **pathspec)
 {
 	struct path_simplify *simplify;
@@ -786,6 +819,9 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const char
 		return dir->nr;
 
 	simplify = create_simplify(pathspec);
+	if (has_leading_ignored_dir(dir, path, len, simplify))
+		return dir->nr;
+
 	read_directory_recursive(dir, path, len, 0, simplify);
 	free_simplify(simplify);
 	qsort(dir->entries, dir->nr, sizeof(struct dir_entry *), cmp_name);
diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh
index c65bca8..9e71260 100755
--- a/t/t3001-ls-files-others-exclude.sh
+++ b/t/t3001-ls-files-others-exclude.sh
@@ -153,4 +153,43 @@ test_expect_success 'negated exclude matches can override previous ones' '
 	grep "^a.1" output
 '
 
+test_expect_success 'subdirectory ignore (setup)' '
+	mkdir -p top/l1/l2 &&
+	(
+		cd top &&
+		git init &&
+		echo /.gitignore >.gitignore &&
+		echo l1 >>.gitignore &&
+		echo l2 >l1/.gitignore &&
+		>l1/l2/l1
+	)
+'
+
+test_expect_success 'subdirectory ignore (toplevel)' '
+	(
+		cd top &&
+		git ls-files -o --exclude-standard
+	) >actual &&
+	>expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'subdirectory ignore (l1/l2)' '
+	(
+		cd top/l1/l2 &&
+		git ls-files -o --exclude-standard
+	) >actual &&
+	>expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'subdirectory ignore (l1)' '
+	(
+		cd top/l1 &&
+		git ls-files -o --exclude-standard
+	) >actual &&
+	>expect &&
+	test_cmp expect actual
+'
+
 test_done
-- 
1.6.6.209.g52296.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]