Re: [PATCH 5/6] dir: replace exponential algorithm with a linear one

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 30, 2020 at 9:13 AM Elijah Newren <newren@xxxxxxxxx> wrote:
>
> On Thu, Jan 30, 2020 at 7:55 AM Derrick Stolee <stolee@xxxxxxxxx> wrote:
[...]
> > > @@ -1713,18 +1719,101 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
> > >
> > >       /* This is the "show_other_directories" case */
> > >
> > > -     if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
> > > +     /*
> > > +      * We only need to recurse into untracked/ignored directories if
> > > +      * either of the following bits is set:
> > > +      *   - DIR_SHOW_IGNORED_TOO (because then we need to determine if
> > > +      *                           there are ignored directories below)
> > > +      *   - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if
> > > +      *                                 the directory is empty)
> >
> > Perhaps here is where you could also have a DIR_LIST_ALL_UNTRACKED
> > flag to ensure the untracked cache loads all untracked paths?
>
> Do you mean DIR_KEEP_UNTRACKED_CONTENTS (which is documented in dir.h
> as only having meaning when DIR_SHOW_IGNORED_TOO is also set, and thus
> caused me to not list it separately)?
>
> Speaking of DIR_KEEP_UNTRACKED_CONTENTS, though, its handling as a
> post-processing step in read_directory() is now inconsistent with how
> we handle squashing a directory full of ignores into just marking the
> containing directory as ignored.  I think I should move the
> read_directory() logic for DIR_KEEP_UNTRACKED_CONTENTS to
> treat_directory() and use another counter similar to old_ignored_nr.
> It should be more efficient that way, too.

Oh, actually, I think I understand what you're getting at so let me
clear it up.  With DIR_SHOW_IGNORED_TOO, we always recurse to the
bottom, because it's needed to find any files that might be ignored.
(Maybe we could do something clever with checking .gitignore entries
and seeing if it's impossible for them to match anything below the
current directory, but the code doesn't do anything that clever.)  As
a side effect, we'll get all untracked files whenever that flag is
set.  As such, the only question is whether we want to keep all those
extra untracked files that we found or not, which is the purpose of
DIR_KEEP_UNTRACKED_CONTENTS.  Without DIR_SHOW_IGNORED_TOO, there's no
need or want to visit all untracked files without also learning of all
ignored files (and, in fact, git-clean is currently the only one that
wants to know about all untracked files).

As far as a simple test goes, in a simple repository with a file named
   one/two/three/four/five/untracked-file
and with nothing else under one/:

Before my changes:
    $ strace -e trace=file git status --ignored 2>&1 | grep
'open("one/' | grep -v gitignore.*ENOENT | wc -l
    62
Note that 62 == 2^5 + 2^4 + 2^3 + 2^2 + 2^1, showing how many
directories we open and read.

After my changes:
    $ strace -e trace=file git status --ignored 2>&1 | grep
'open("one/' | grep -v gitignore.*ENOENT | wc -l
    5
showing that it does open and read each directory, but does so only once.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux