On Mon, Jan 27, 2020 at 4:08 AM SZEDER Gábor <szeder.dev@xxxxxxxxx> wrote: > > On Mon, Jan 27, 2020 at 11:55:01AM +0100, Martin Melka wrote: > > Hi all, I have ran across what might be a bug in git. When there is a > > deep directory structure (tried on 100+ nested dirs), then git status > > --ignored hangs indefinitely. > > Discovered this on OSX (Mojave, git 2.20.1 (Apple Git-117)), but it > > reproduces in Ubuntu (19.04, git 2.25.0) Docker container on OSX and > > also on baremetal Ubuntu server (16.04, git 2.17.1). > > > > Steps to reproduce: > > > > 1. Generate the deep dir structure: > > > > mkdir gittest; pushd gittest; git init; for i in $(seq 1 120); do > > mkdir dir; cd dir; done; touch leaf; popd > > > > 2. Try to get git status --ignored > > > > cd gittest && git status --ignored > > > > > > If there is a dir depth limit, git should probably exit with an error > > rather than getting stuck endlessly. > > This is interesting, thanks for the report. Agreed. > There is no such directory depth limit, but the runtime of 'git status > --ignored' grows quickly with the depth of the untracked directory. > Running this shell loop produces the numbers below: > > for depth in $(seq 10 30) > do > dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done) > rm -rf dir > mkdir -p $dirs > >$dirs/untracked-file > /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null > done > > 10: 0.01 > 11: 0.03 > 12: 0.05 > 13: 0.11 > 14: 0.23 > 15: 0.47 > 16: 0.97 > 17: 1.97 > 18: 3.88 > 19: 7.85 > 20: 16.29 > 21: 32.92 > 22: 76.24 Wow. Really nice testcase, though, thanks. > Beautifully quadratic, isn't it? :) I think you mean exponential (in particular 2^n rather than n^2). > Unless I messed up my numbers, with a depth of 120 directories it > would take over 6*10^23 years to complete... so yeah, it does qualify > as indefinitely. No comment about how people today are spoiled and addicted to instant gratification? I mean, can't folks just be a little patient? ;-) > This slowdown was caused by commit df5bcdf83a (dir: recurse into > untracked dirs for ignored files, 2017-05-18), which was part of a > patch series to fix 'git clean -d' deleting untracked directories even > if they contained ignored files. > > Cc'ing Samuel, author of that commit, and Elijah, who had quite some > fun with 'dir.c' recently. Heh, yes, what "fun" it was. Anyway, after digging around for quite a bit today... that commit added calling read_directory_recursive() directly from itself for certain untracked paths. This means that read_directory_recursive() (which I'll abbreviate to r_d_r()), when we're dealing with certain untracked paths: * Calls treat_path() -> treat_one_path() -> treat_directory() -> r_d_r() * Calls r_d_r() directly as well So, from the toplevel directory, r_d_r() will call itself twice on the next directory down. For each of those, it'll call r_d_r() twice on the second directory down. From each of those, it'll call r_d_r() twice on the third directory in the hierarchy and so on until we have 2^n calls to r_d_r() for the nth level deep directory. Trying to back out the underlying problem, I _think_ the cause behind all this is that r_d_r() and friends all use the path_treatment enum, which says that the treatment of any path has to be one of those four types. The dichotomy between path_untracked and path_recurse in particular means the system has no way to mark that something should be both marked as untracked and recursed into, yet we definitely need to have some directories marked as untracked and we also need to recurse into them. This, I think led to Samuel's attempt to workaround that dichotomy by having the code in r_d_r() check for certain path_untracked cases which should also be recursed into. I think adding another type to the enum and shifting the logic elsewhere might enable us to both simplify the logic and avoid this expensive exponential behavior, but I haven't gotten it working yet. We'll see if my patch goes anywhere, or if it's just another dead-end among many. Elijah