On Mon, May 22, 2017 at 12:48 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Samuel Lijin <sxlijin@xxxxxxxxx> writes: > >> + for (j = i = 0; i < dir.nr;) { >> + for (; >> + j < dir.ignored_nr && >> + 0 <= cmp_dir_entry(&dir.entries[i], &dir.ignored[j]); >> + j++); >> + >> + if ((j < dir.ignored_nr) && >> + check_dir_entry_contains(dir.entries[i], dir.ignored[j])) { >> + /* skip any dir.entries which contains a dir.ignored */ >> + free(dir.entries[i]); >> + dir.entries[i++] = NULL; >> + } else { >> + /* prune the contents of a dir.entries which will be removed */ >> + struct dir_entry *ent = dir.entries[i++]; >> + for (; >> + i < dir.nr && >> + check_dir_entry_contains(ent, dir.entries[i]); >> + i++) { >> + free(dir.entries[i]); >> + dir.entries[i] = NULL; >> + } >> + } >> + } > > The second loop in the else clause is a bit tricky, and the comment > "which will be removed" is not all that helpful to explain why the > loop is there. > > But I think the code is correct. Here is how I understood it. > > While looking at dir.entries[i], the code noticed that nothing > in that directory is ignored. But entries in dir.entries[] that > come later may be contained in dir.entries[i] and we just want > to show the top-level untracked one (e.g. "a/" and "a/b/" were > in entries[], there is nothing in "a/", so naturally there is > nothing in "a/b/", but we do not want to bother showing > both---showing "a/" alone saying "the entire a/ is untracked" is > what we want). Yep, that's the gist of it. More specifically: the contents of untracked dirs have to be picked up so that clean -d can distinguish between purely untracked dirs and untracked dirs which also contain ignored files. In the case of a purely untracked dir, clean -d can remove the dir itself, and should just skip over all the dir's contents; in the case of a mixed untracked dir, clean -d should not remove the dir itself, just the untracked contents. > We may want to have a test to ensure "a/b/" is indeed omitted in > such a situation from the output, though. Is there a reason to ensure specifically a/b/ is tested, or are the current tests, which do cover the a/b (i.e. where a/b is a file, not where a/b/ is a dir) case, insufficient for some reason? > By the way, instead of putting NULL, it may be easier to follow if > you used two pointers, src and dst, into dir.entries[], just like > you did in your latest version of [PATCH 4/6]. That way, you do not > have to change anything in the later loop that walks over elements > in the dir.entries[] array. It would also help the logic easier to > follow if the above loop were its own helper function. Agreed on the helper function. On the src-dst thing: I considered it, but I figured another O(n) set of array moves was unnecessary. I guess this is one of those cases where premature optimization doesn't make sense? > Putting them all together, here is what I came up with that can be > squashed into your patch. I am undecided myself if this is easier > to follow than your version, but it seems to pass your test ;-) > > Thanks. > > builtin/clean.c | 70 ++++++++++++++++++++++++++++++++++----------------------- > 1 file changed, 42 insertions(+), 28 deletions(-) > > diff --git a/builtin/clean.c b/builtin/clean.c > index dd3308a447..c8712e7ac8 100644 > --- a/builtin/clean.c > +++ b/builtin/clean.c > @@ -851,9 +851,49 @@ static void interactive_main_loop(void) > } > } > > +static void simplify_untracked(struct dir_struct *dir) > +{ > + int src, dst, ign; > + > + for (src = dst = ign = 0; src < dir->nr; src++) { > + /* > + * Skip entries in ignored[] that cannot be inside > + * entries[src] > + */ > + while (ign < dir->ignored_nr && > + 0 <= cmp_dir_entry(&dir->entries[src], &dir->ignored[ign])) > + ign++; > + > + if (dir->ignored_nr <= ign || > + !check_dir_entry_contains(dir->entries[src], dir->ignored[ign])) { > + /* > + * entries[src] does not contain an ignored > + * path -- we need to keep it. But we do not > + * want to show entries[] that are contained > + * in entries[src]. > + */ > + struct dir_entry *ent = dir->entries[src++]; > + dir->entries[dst++] = ent; > + while (src < dir->nr && > + check_dir_entry_contains(ent, dir->entries[src])) { > + free(dir->entries[src++]); > + } > + /* compensate for the outer loop's loop control */ > + src--; > + } else { > + /* > + * entries[src] contains an ignored path -- > + * drop it. > + */ > + free(dir->entries[src]); > + } > + } > + dir->nr = dst; > +} > + > int cmd_clean(int argc, const char **argv, const char *prefix) > { > - int i, j, res; > + int i, res; > int dry_run = 0, remove_directories = 0, quiet = 0, ignored = 0; > int ignored_only = 0, config_set = 0, errors = 0, gone = 1; > int rm_flags = REMOVE_DIR_KEEP_NESTED_GIT; > @@ -928,30 +968,7 @@ int cmd_clean(int argc, const char **argv, const char *prefix) > prefix, argv); > > fill_directory(&dir, &pathspec); > - > - for (j = i = 0; i < dir.nr;) { > - for (; > - j < dir.ignored_nr && > - 0 <= cmp_dir_entry(&dir.entries[i], &dir.ignored[j]); > - j++); > - > - if ((j < dir.ignored_nr) && > - check_dir_entry_contains(dir.entries[i], dir.ignored[j])) { > - /* skip any dir.entries which contains a dir.ignored */ > - free(dir.entries[i]); > - dir.entries[i++] = NULL; > - } else { > - /* prune the contents of a dir.entries which will be removed */ > - struct dir_entry *ent = dir.entries[i++]; > - for (; > - i < dir.nr && > - check_dir_entry_contains(ent, dir.entries[i]); > - i++) { > - free(dir.entries[i]); > - dir.entries[i] = NULL; > - } > - } > - } > + simplify_untracked(&dir); > > for (i = 0; i < dir.nr; i++) { > struct dir_entry *ent = dir.entries[i]; > @@ -959,9 +976,6 @@ int cmd_clean(int argc, const char **argv, const char *prefix) > struct stat st; > const char *rel; > > - if (!ent) > - continue; > - > if (!cache_name_is_other(ent->name, ent->len)) > continue; >