Cygwin sparse checkout degrades performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Counter-intuitively, using sparse checkout in Cygwin degrades "status" times as status appears to "stat" non-existent files and directories.

To demonstrate, I created a repo with 100k random files in a dir/dir/dir/file structure (on a linux box -- to do this in Cygwin requires piping the result of "openssl rand" to "dos2unix" as the output contains "\r") and cloned in a Cygwin shell:

git init test
cd test
git commit --allow-empty -m 'Empty first commit'
for i in {1..10}; do for j in {1..10000}; do file=$( openssl rand -hex 32 | sed 's,^\(.\)\(.\)\(.\),\1/\2/\3/,'); mkdir -p $( dirname $file ); echo $file > $file ; done & done; wait
git add .
git commit -m '100000 files'
git gc --prune=now --aggressive

I then timed and plotted "git status" as sparse checkout step-wisely reduced the number of files in the working tree using the folllowing command:

( ( git status >& /dev/null; time -p git status > /dev/null ) |& sed -n '/real/{s/real/100000/p}'; git config core.sparseCheckout true; for i in $( seq 90000 -10000 10000 ) 1; do git ls-files | head -n $i | sed 's,^,/,' > .git/info/sparse-checkout; git read-tree -u -m HEAD; git status >& /dev/null; ( time -p git status > /dev/null ) |& sed -n "/real/{s/real/$i/p}"; done; echo '*' > .git/info/sparse-checkout; git read-tree -u -m HEAD; rm .git/info/sparse-checkout; git config --unset core.sparseCheckout ) | gnuplot -p -e "set terminal dumb; set xrange[] reverse; set style data dots; set nokey; plot '-' using 1:2"

Vertical bar is time in seconds, horizontal the number of files in the working tree after the sparse checkout.

Linux results (v2.1.0):
0.45 .+-----+------+-----+------+------+------+------+-----+------+-----++
       +      +      +     +      +      +      +      + +      +      +
| |
   0.4 ++ ++
| |
  0.35 ++ ++
| |
| |
   0.3 ++ .                                                           ++
| |
       | .                                                     |
  0.25 ++                  . .                                       ++
       |                                 . .                          |
| |
   0.2 ++                                              . .            ++
| |
  0.15 ++                                                           . +.
| |
       +      +      +     +      +      +      +      + +      +      +
0.1 ++-----+------+-----+------+------+------+------+-----+------+-----++
     100000 90000  80000 70000  60000  50000  40000  30000 20000 10000    0

Cygwin results (v2.1.1):
10 ++-----+------+------+------+------+------+------+------+------+-----++
     +      +      +      +      +      +      +      + +      +      +
| .
   9 ++ ++
| .      |
| |
   8 ++ .            ++
| |
| |
   7 ++ .                   ++
| |
| |
     |                                  . .                           |
   6 ++ ++
| |
| |
   5 ++ .                                                             ++
     .                    . .                                         |
     +      +      .      +      +      +      +      + +      +      +
4 ++-----+------+------+------+------+------+------+------+------+-----++
   100000 90000  80000  70000  60000  50000  40000  30000  20000 10000    0

Linux times do what I expect/want (they get better as the number of working tree files decrease), but Cygwin does the opposite: the worst times are in a working tree with only 1 (sparse) file, and it's double where I started with no sparse checkout! I'd hoped sparse checkout would improve the too-slow status times when all files are present...

Looking at strace with a working tree consisting of a single (sparse) file suggests Cygwin is attempting to access the non-existent files and directories whereas Linux does not appear to do so. In fact, if I do nothing more than "mkdir -p $( git ls-files | cut -c1-5 | sort -u )" when looking at a single (sparse) file, I can drop status times below 3s, a 3-fold improvement and something at least better than where I started!

Is there a way I can get improved status times using sparse checkout with Cygwin?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]