Re: git status --ignored hangs when a deep directory structure present in working tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 27, 2020 at 4:08 AM SZEDER Gábor <szeder.dev@xxxxxxxxx> wrote:
>
> On Mon, Jan 27, 2020 at 11:55:01AM +0100, Martin Melka wrote:
> > Hi all, I have ran across what might be a bug in git. When there is a
> > deep directory structure (tried on 100+ nested dirs), then git status
> > --ignored hangs indefinitely.
> > Discovered this on OSX (Mojave, git 2.20.1 (Apple Git-117)), but it
> > reproduces in Ubuntu (19.04, git 2.25.0) Docker container on OSX and
> > also on baremetal Ubuntu server (16.04, git 2.17.1).
> >
> > Steps to reproduce:
> >
> > 1. Generate the deep dir structure:
> >
> >     mkdir gittest; pushd gittest; git init; for i in $(seq 1 120); do
> > mkdir dir; cd dir; done; touch leaf; popd
> >
> > 2. Try to get git status --ignored
> >
> >     cd gittest && git status --ignored
> >
> >
> > If there is a dir depth limit, git should probably exit with an error
> > rather than getting stuck endlessly.
>
> This is interesting, thanks for the report.

Agreed.

> There is no such directory depth limit, but the runtime of 'git status
> --ignored' grows quickly with the depth of the untracked directory.
> Running this shell loop produces the numbers below:
>
> for depth in $(seq 10 30)
> do
>         dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
>         rm -rf dir
>         mkdir -p $dirs
>         >$dirs/untracked-file
>         /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
> done
>
> 10: 0.01
> 11: 0.03
> 12: 0.05
> 13: 0.11
> 14: 0.23
> 15: 0.47
> 16: 0.97
> 17: 1.97
> 18: 3.88
> 19: 7.85
> 20: 16.29
> 21: 32.92
> 22: 76.24

Wow.

Really nice testcase, though, thanks.

> Beautifully quadratic, isn't it? :)

I think you mean exponential (in particular 2^n rather than n^2).

> Unless I messed up my numbers, with a depth of 120 directories it
> would take over 6*10^23 years to complete... so yeah, it does qualify
> as indefinitely.

No comment about how people today are spoiled and addicted to instant
gratification?  I mean, can't folks just be a little patient?  ;-)

> This slowdown was caused by commit df5bcdf83a (dir: recurse into
> untracked dirs for ignored files, 2017-05-18), which was part of a
> patch series to fix 'git clean -d' deleting untracked directories even
> if they contained ignored files.
>
> Cc'ing Samuel, author of that commit, and Elijah, who had quite some
> fun with 'dir.c' recently.

Heh, yes, what "fun" it was.

Anyway, after digging around for quite a bit today... that commit
added calling read_directory_recursive() directly from itself for
certain untracked paths.  This means that read_directory_recursive()
(which I'll abbreviate to r_d_r()), when we're dealing with certain
untracked paths:

  * Calls treat_path() -> treat_one_path() -> treat_directory() -> r_d_r()
  * Calls r_d_r() directly as well

So, from the toplevel directory, r_d_r() will call itself twice on the
next directory down.  For each of those, it'll call r_d_r() twice on
the second directory down.  From each of those, it'll call r_d_r()
twice on the third directory in the hierarchy and so on until we have
2^n calls to r_d_r() for the nth level deep directory.

Trying to back out the underlying problem, I _think_ the cause behind
all this is that r_d_r() and friends all use the path_treatment enum,
which says that the treatment of any path has to be one of those four
types. The dichotomy between path_untracked and path_recurse in
particular means the system has no way to mark that something should
be both marked as untracked and recursed into, yet we definitely need
to have some directories marked as untracked and we also need to
recurse into them.  This, I think led to Samuel's attempt to
workaround that dichotomy by having the code in r_d_r() check for
certain path_untracked cases which should also be recursed into.  I
think adding another type to the enum and shifting the logic elsewhere
might enable us to both simplify the logic and avoid this expensive
exponential behavior, but I haven't gotten it working yet.  We'll see
if my patch goes anywhere, or if it's just another dead-end among
many.


Elijah




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux