On 04/06, Jeff Hostetler wrote: > > > On 4/6/2017 6:14 PM, Thomas Gummerer wrote: > >On 04/06, git@xxxxxxxxxxxxxxxxx wrote: > >>From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> > >> > >>Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> > >>--- > >> t/perf/p0005-status.sh | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 61 insertions(+) > >> create mode 100755 t/perf/p0005-status.sh > >> > >>diff --git a/t/perf/p0005-status.sh b/t/perf/p0005-status.sh > >>new file mode 100755 > >>index 0000000..704cebc > >>--- /dev/null > >>+++ b/t/perf/p0005-status.sh > >>@@ -0,0 +1,61 @@ > >>+#!/bin/sh > >>+ > >>+test_description="Tests performance of read-tree" > >>+ > >>+. ./perf-lib.sh > >>+ > >>+test_perf_default_repo > >>+test_checkout_worktree > >>+ > >>+## usage: dir depth width files > >>+make_paths () { > >>+ for f in $(seq $4) > >>+ do > >>+ echo $1/file$f > >>+ done; > >>+ if test $2 -gt 0; > >>+ then > >>+ for w in $(seq $3) > >>+ do > >>+ make_paths $1/dir$w $(($2 - 1)) $3 $4 > >>+ done > >>+ fi > >>+ return 0 > >>+} > >>+ > >>+fill_index () { > >>+ make_paths $1 $2 $3 $4 | > >>+ sed "s/^/100644 $EMPTY_BLOB /" | > >>+ git update-index --index-info > >>+ return 0 > >>+} > >>+ > >>+br_work1=xxx_work1_xxx > >>+dir_new=xxx_dir_xxx > >>+ > >>+## (5, 10, 9) will create 999,999 files. > >>+## (4, 10, 9) will create 99,999 files. > >>+depth=5 > >>+width=10 > >>+files=9 > >>+ > >>+## Inflate the index with thousands of empty files and commit it. > >>+## Use reset to actually populate the worktree. > >>+test_expect_success 'inflate the index' ' > >>+ git reset --hard && > >>+ git branch $br_work1 && > >>+ git checkout $br_work1 && > >>+ fill_index $dir_new $depth $width $files && > >>+ git commit -m $br_work1 && > >>+ git reset --hard > >>+' > >>+ > >>+## The number of files in the branch. > >>+nr_work1=$(git ls-files | wc -l) > > > >The above seems to be repeated (or at least very similar to what you > >have in your other series [1]. Especially in this perf test wouldn't > >it be better just use test_perf_large_repo, and let whoever runs the > >test decide what constitutes a large repository for them? > > > >The other advantage of that would be that it is more of a real-world > >scenario, instead of a synthetic distribution of the files, which > >would give us some better results I think. > > > >Is there anything I'm missing that would make using > >test_perf_large_repo not a good option here? > > Yes, it is copied from the other series. I make the same change > that Rene just suggested on it to use awk to create the list. > > I did this because I need very large repos. From what I can tell > the common usage is to set test_perf_large_repo to linux.git, but > that only has 58K files. And it defaults to git.git which only > has 3K files. Yeah true. Back when I worked on "index v5" for my GSoC project, I used to use the webkit repository, which at the time had 300-something K files. Nowadays the better test might be the chromium repository, but I'm not sure (cloning that takes a while on my connection :) ). > Internally, I test against the Windows source tree with 3.1M files, > but I can't share that :-) Heh. I'd love to see the performance numbers for that though! > So I created this test to generate artificial, but large and > reproducible repos for evaluation. > > I could change the default depth to 4 (giving a 100K tree), but > I'm really interested in 1M+ repos. For small-ish values of n > the difference between O(n) and O(n log n) operations can hide > in system and I/O noise; not so for very large n.... Makes sense to me. Thanks for the explanation! > > > >[1]: http://public-inbox.org/git/20170406163442.36463-3-git@xxxxxxxxxxxxxxxxx/ > > > >>+test_perf "read-tree status work1 ($nr_work1)" ' > >>+ git read-tree HEAD && > >>+ git status > >>+' > >>+ > >>+test_done > >>-- > >>2.9.3 > >>