On Wed, May 17, 2017 at 2:47 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Samuel Lijin <sxlijin@xxxxxxxxx> writes: > >> When we taught read_directory_recursive() to recurse into untracked >> directories in search of ignored files given DIR_SHOW_IGNORED_TOO, that >> had the side effect of teaching it to collect the untracked contents of >> untracked directories. It doesn't always make sense to return these, >> though (we do need them for `clean -d`), so we introduce a flag >> (DIR_KEEP_UNTRACKED_CONTENTS) to control whether or not read_directory() >> strips dir->entries of the untracked contents of untracked dirs. >> >> We also introduce check_contains() to check if one dir_entry corresponds >> to a path which contains the path corresponding to another dir_entry. >> >> Signed-off-by: Samuel Lijin <sxlijin@xxxxxxxxx> >> --- >> dir.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> dir.h | 3 ++- >> 2 files changed, 56 insertions(+), 1 deletion(-) >> >> diff --git a/dir.c b/dir.c >> index 6bd0350e9..214a148ee 100644 >> --- a/dir.c >> +++ b/dir.c >> @@ -1852,6 +1852,14 @@ static int cmp_name(const void *p1, const void *p2) >> return name_compare(e1->name, e1->len, e2->name, e2->len); >> } >> >> +/* check if *out lexically contains *in */ >> +static int check_contains(const struct dir_entry *out, const struct dir_entry *in) >> +{ >> + return (out->len < in->len) && >> + (out->name[out->len - 1] == '/') && >> + !memcmp(out->name, in->name, out->len); >> +} > > OK, treat_one_path() and treat_pah_fast() both ensure that a path to > a directory is terminated with '/' before calling dir_add_name() and > dir_add_ignored(), so we know a dir_entry "out" that is a directory > must end with '/'. Good. > > The second and third line being overly indented is a bit > distracting, though. > >> static int treat_leading_path(struct dir_struct *dir, >> const char *path, int len, >> const struct pathspec *pathspec) >> @@ -2067,6 +2075,52 @@ int read_directory(struct dir_struct *dir, const char *path, >> read_directory_recursive(dir, path, len, untracked, 0, pathspec); >> QSORT(dir->entries, dir->nr, cmp_name); >> QSORT(dir->ignored, dir->ignored_nr, cmp_name); >> + >> + // if DIR_SHOW_IGNORED_TOO, read_directory_recursive() will also pick >> + // up untracked contents of untracked dirs; by default we discard these, >> + // but given DIR_KEEP_UNTRACKED_CONTENTS we do not > > /* > * Our multi-line comments are formatted like this > * example. No C++/C99 // comments, outside of > * borrowed code and platform specific compat/ code, > * please. > */ Gahhhh, I keep forgetting about this, sorry. (There has to be a way to tell my compiler to catch this, right? It's pretty embarrassing to get called out for this twice...) >> + if ((dir->flags & DIR_SHOW_IGNORED_TOO) >> + && !(dir->flags & DIR_KEEP_UNTRACKED_CONTENTS)) { > > Both having && at the end and && at the beginning are valid C, but > please stick to one style in a single file. Got it. >> + int i, j, nr_removed = 0; >> + >> + // remove from dir->entries untracked contents of untracked dirs > > /* And our single-liner comments look like this */ > >> + for (i = 0; i < dir->nr; i++) { >> + if (!dir->entries[i]) >> + continue; >> + >> + for (j = i + 1; j < dir->nr; j++) { >> + if (!dir->entries[j]) >> + continue; >> + if (check_contains(dir->entries[i], dir->entries[j])) { >> + nr_removed++; >> + free(dir->entries[j]); >> + dir->entries[j] = NULL; >> + } >> + else { >> + break; >> + } >> + } >> + } > > This loop is O(n^2). I wonder if we can do better, especially we > know dir->entries[] is sorted already. Now that I think about it, dropping an `i = j - 1` into the inner loop right before the break should work: + else { + i = j - 1; + break; + } > Well, because it is sorted, if A/, A/B, and A/B/C are all untracked, > the first round that scans for A/ will nuke both A/B and A/B/C, so > we won't have to scan looking for entries inside A/B, which is a bit > of consolation ;-) > >> + for (i = 0;;) { >> + while (i < dir->nr && dir->entries[i]) >> + i++; >> + if (i == dir->nr) >> + break; >> + j = i; >> + while (j < dir->nr && !dir->entries[j]) >> + j++; >> + if (j == dir->nr) >> + break; >> + dir->entries[i] = dir->entries[j]; >> + dir->entries[j] = NULL; >> + } >> + dir->nr -= nr_removed; > > This looks like an overly complicated way to scan an array and skip > NULLs. Are you doing an equivalent of this loop, or am I missing > something subtle? > > for (src = dst = 0; src < nr; src++) > if (array[src]) > array[dst++] = src; > nr = dst; Nope, that's pretty much it. Just me overthinking the problem.