On 04/24, Ævar Arnfjörð Bjarmason wrote: > Russ Cox just published an article about how various glob() > implementations suffer from pathological performance when fed certain > pathological patterns like "a*a*a*a*b" given a file like "aaaaaaa...": > https://research.swtch.com/glob > > I was curious to see if this impacted git. It turns out it does. This > used to be a per-platform issue with git, since globbing was > implemented via fnmatch() by default before v1.8.4, and support for > using the OS fnmatch() was removed entirely in v2.0.0. > > This performance test shows the regression: > > $ GIT_PERF_REPEAT_COUNT=1 GIT_PERF_MAKE_OPTS="[...]NO_WILDMATCH=YesPlease[...]" ./run v1.9.5 v2.0.0 v2.12.0 p0100-globbing.sh > [...] > Test v1.9.5 v2.0.0 v2.12.0 > ------------------------------------------------------------------------------------------------------------------------------ > [...] > 0100.7: fileglob((a*)^nb) against file (a^100).t; n = 1 0.01(0.00+0.00) 0.00(0.00+0.00) -100.0% 0.01(0.00+0.00) +0.0% > 0100.8: fileglob((a*)^nb) against file (a^100).t; n = 2 0.01(0.00+0.00) 0.00(0.00+0.00) -100.0% 0.01(0.00+0.00) +0.0% > 0100.9: fileglob((a*)^nb) against file (a^100).t; n = 3 0.00(0.00+0.00) 0.00(0.00+0.00) = 0.01(0.00+0.00) +inf > 0100.10: fileglob((a*)^nb) against file (a^100).t; n = 4 0.00(0.00+0.00) 0.01(0.01+0.00) +inf 0.02(0.02+0.00) +inf > 0100.11: fileglob((a*)^nb) against file (a^100).t; n = 5 0.00(0.00+0.00) 0.20(0.19+0.00) +inf 0.24(0.23+0.00) +inf > 0100.12: fileglob((a*)^nb) against file (a^100).t; n = 6 0.00(0.00+0.00) 3.03(3.00+0.00) +inf 3.08(3.05+0.00) +inf > > And here's a one-liner to do the same: > > $ time (rm -rf test; git init -q test && (cd test && touch $(perl -e 'print "a" x 100').t && git add a* && git commit -q -m"test" && git ls-files 'a*a*a*a*a*a*a*b')) > > Add or remove "a*"'s to adjust the runtime. With 6 that executes in 3 > seconds on my system, 40 seconds with 7 etc. > > I don't think this is something we need to worry much about, if you > have a file like this and feed Git insane patterns you probably > deserve what you get. > > The real concern is if we have behavior like this and ever e.g. expose > globbing over the network, e.g. in some future upload-pack protocol. > > There are probably some web-based programs built on top of git that > are vulnerable to DoS attacks as a result of this, e.g. if they take > user-supplied globs and feed them to ls-files. I was taking a look at wildmatch a few months ago and have an unfinished patch to do some cleanup there. I noticed this was inefficient but didn't expect those kinds of numbers. I wonder how difficult it would be to rewrite it so that we don't have this issue. -- Brandon Williams