Re: [PATCH v3 4/8] dir: hide untracked contents of untracked dirs

Samuel Lijin <sxlijin@xxxxxxxxx> · Wed, 17 May 2017 03:32:09 -0400

On Wed, May 17, 2017 at 2:47 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Samuel Lijin <sxlijin@xxxxxxxxx> writes:
>
>> When we taught read_directory_recursive() to recurse into untracked
>> directories in search of ignored files given DIR_SHOW_IGNORED_TOO, that
>> had the side effect of teaching it to collect the untracked contents of
>> untracked directories. It doesn't always make sense to return these,
>> though (we do need them for `clean -d`), so we introduce a flag
>> (DIR_KEEP_UNTRACKED_CONTENTS) to control whether or not read_directory()
>> strips dir->entries of the untracked contents of untracked dirs.
>>
>> We also introduce check_contains() to check if one dir_entry corresponds
>> to a path which contains the path corresponding to another dir_entry.
>>
>> Signed-off-by: Samuel Lijin <sxlijin@xxxxxxxxx>
>> ---
>>  dir.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  dir.h |  3 ++-
>>  2 files changed, 56 insertions(+), 1 deletion(-)
>>
>> diff --git a/dir.c b/dir.c
>> index 6bd0350e9..214a148ee 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -1852,6 +1852,14 @@ static int cmp_name(const void *p1, const void *p2)
>>       return name_compare(e1->name, e1->len, e2->name, e2->len);
>>  }
>>
>> +/* check if *out lexically contains *in */
>> +static int check_contains(const struct dir_entry *out, const struct dir_entry *in)
>> +{
>> +     return (out->len < in->len) &&
>> +                     (out->name[out->len - 1] == '/') &&
>> +                     !memcmp(out->name, in->name, out->len);
>> +}
>
> OK, treat_one_path() and treat_pah_fast() both ensure that a path to
> a directory is terminated with '/' before calling dir_add_name() and
> dir_add_ignored(), so we know a dir_entry "out" that is a directory
> must end with '/'.  Good.
>
> The second and third line being overly indented is a bit
> distracting, though.
>
>>  static int treat_leading_path(struct dir_struct *dir,
>>                             const char *path, int len,
>>                             const struct pathspec *pathspec)
>> @@ -2067,6 +2075,52 @@ int read_directory(struct dir_struct *dir, const char *path,
>>               read_directory_recursive(dir, path, len, untracked, 0, pathspec);
>>       QSORT(dir->entries, dir->nr, cmp_name);
>>       QSORT(dir->ignored, dir->ignored_nr, cmp_name);
>> +
>> +     // if DIR_SHOW_IGNORED_TOO, read_directory_recursive() will also pick
>> +     // up untracked contents of untracked dirs; by default we discard these,
>> +     // but given DIR_KEEP_UNTRACKED_CONTENTS we do not
>
>         /*
>          * Our multi-line comments are formatted like this
>          * example.  No C++/C99 // comments, outside of
>          * borrowed code and platform specific compat/ code,
>          * please.
>          */

Gahhhh, I keep forgetting about this, sorry. (There has to be a way to
tell my compiler to catch this, right? It's pretty embarrassing to get
called out for this twice...)

>> +     if ((dir->flags & DIR_SHOW_IGNORED_TOO)
>> +                  && !(dir->flags & DIR_KEEP_UNTRACKED_CONTENTS)) {
>
> Both having && at the end and && at the beginning are valid C, but
> please stick to one style in a single file.

Got it.

>> +             int i, j, nr_removed = 0;
>> +
>> +             // remove from dir->entries untracked contents of untracked dirs
>
>         /* And our single-liner comments look like this */
>
>> +             for (i = 0; i < dir->nr; i++) {
>> +                     if (!dir->entries[i])
>> +                             continue;
>> +
>> +                     for (j = i + 1; j < dir->nr; j++) {
>> +                             if (!dir->entries[j])
>> +                                     continue;
>> +                             if (check_contains(dir->entries[i], dir->entries[j])) {
>> +                                     nr_removed++;
>> +                                     free(dir->entries[j]);
>> +                                     dir->entries[j] = NULL;
>> +                             }
>> +                             else {
>> +                                     break;
>> +                             }
>> +                     }
>> +             }
>
> This loop is O(n^2).  I wonder if we can do better, especially we
> know dir->entries[] is sorted already.

Now that I think about it, dropping an `i = j - 1` into the inner loop
right before the break should work:

+                             else {
+                                     i = j - 1;
+                                     break;
+                             }

> Well, because it is sorted, if A/, A/B, and A/B/C are all untracked,
> the first round that scans for A/ will nuke both A/B and A/B/C, so
> we won't have to scan looking for entries inside A/B, which is a bit
> of consolation ;-)
>
>> +                     for (i = 0;;) {
>> +                             while (i < dir->nr && dir->entries[i])
>> +                                     i++;
>> +                             if (i == dir->nr)
>> +                                     break;
>> +                             j = i;
>> +                             while (j < dir->nr && !dir->entries[j])
>> +                                     j++;
>> +                             if (j == dir->nr)
>> +                                     break;
>> +                             dir->entries[i] = dir->entries[j];
>> +                             dir->entries[j] = NULL;
>> +                     }
>> +                     dir->nr -= nr_removed;
>
> This looks like an overly complicated way to scan an array and skip
> NULLs.  Are you doing an equivalent of this loop, or am I missing
> something subtle?
>
>         for (src = dst = 0; src < nr; src++)
>                 if (array[src])
>                         array[dst++] = src;
>         nr = dst;

Nope, that's pretty much it. Just me overthinking the problem.