Samuel Lijin <sxlijin@xxxxxxxxx> writes: > When we taught read_directory_recursive() to recurse into untracked > directories in search of ignored files given DIR_SHOW_IGNORED_TOO, that > had the side effect of teaching it to collect the untracked contents of > untracked directories. It doesn't always make sense to return these, > though (we do need them for `clean -d`), so we introduce a flag > (DIR_KEEP_UNTRACKED_CONTENTS) to control whether or not read_directory() > strips dir->entries of the untracked contents of untracked dirs. > > We also introduce check_contains() to check if one dir_entry corresponds > to a path which contains the path corresponding to another dir_entry. > > Signed-off-by: Samuel Lijin <sxlijin@xxxxxxxxx> > --- > dir.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > dir.h | 3 ++- > 2 files changed, 56 insertions(+), 1 deletion(-) > > diff --git a/dir.c b/dir.c > index 6bd0350e9..214a148ee 100644 > --- a/dir.c > +++ b/dir.c > @@ -1852,6 +1852,14 @@ static int cmp_name(const void *p1, const void *p2) > return name_compare(e1->name, e1->len, e2->name, e2->len); > } > > +/* check if *out lexically contains *in */ > +static int check_contains(const struct dir_entry *out, const struct dir_entry *in) > +{ > + return (out->len < in->len) && > + (out->name[out->len - 1] == '/') && > + !memcmp(out->name, in->name, out->len); > +} OK, treat_one_path() and treat_pah_fast() both ensure that a path to a directory is terminated with '/' before calling dir_add_name() and dir_add_ignored(), so we know a dir_entry "out" that is a directory must end with '/'. Good. The second and third line being overly indented is a bit distracting, though. > static int treat_leading_path(struct dir_struct *dir, > const char *path, int len, > const struct pathspec *pathspec) > @@ -2067,6 +2075,52 @@ int read_directory(struct dir_struct *dir, const char *path, > read_directory_recursive(dir, path, len, untracked, 0, pathspec); > QSORT(dir->entries, dir->nr, cmp_name); > QSORT(dir->ignored, dir->ignored_nr, cmp_name); > + > + // if DIR_SHOW_IGNORED_TOO, read_directory_recursive() will also pick > + // up untracked contents of untracked dirs; by default we discard these, > + // but given DIR_KEEP_UNTRACKED_CONTENTS we do not /* * Our multi-line comments are formatted like this * example. No C++/C99 // comments, outside of * borrowed code and platform specific compat/ code, * please. */ > + if ((dir->flags & DIR_SHOW_IGNORED_TOO) > + && !(dir->flags & DIR_KEEP_UNTRACKED_CONTENTS)) { Both having && at the end and && at the beginning are valid C, but please stick to one style in a single file. > + int i, j, nr_removed = 0; > + > + // remove from dir->entries untracked contents of untracked dirs /* And our single-liner comments look like this */ > + for (i = 0; i < dir->nr; i++) { > + if (!dir->entries[i]) > + continue; > + > + for (j = i + 1; j < dir->nr; j++) { > + if (!dir->entries[j]) > + continue; > + if (check_contains(dir->entries[i], dir->entries[j])) { > + nr_removed++; > + free(dir->entries[j]); > + dir->entries[j] = NULL; > + } > + else { > + break; > + } > + } > + } This loop is O(n^2). I wonder if we can do better, especially we know dir->entries[] is sorted already. Well, because it is sorted, if A/, A/B, and A/B/C are all untracked, the first round that scans for A/ will nuke both A/B and A/B/C, so we won't have to scan looking for entries inside A/B, which is a bit of consolation ;-) > + for (i = 0;;) { > + while (i < dir->nr && dir->entries[i]) > + i++; > + if (i == dir->nr) > + break; > + j = i; > + while (j < dir->nr && !dir->entries[j]) > + j++; > + if (j == dir->nr) > + break; > + dir->entries[i] = dir->entries[j]; > + dir->entries[j] = NULL; > + } > + dir->nr -= nr_removed; This looks like an overly complicated way to scan an array and skip NULLs. Are you doing an equivalent of this loop, or am I missing something subtle? for (src = dst = 0; src < nr; src++) if (array[src]) array[dst++] = src; nr = dst;