Re: [PATCH v3 2/2] p0005-status: time status on very large repo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 4/6/2017 6:14 PM, Thomas Gummerer wrote:
On 04/06, git@xxxxxxxxxxxxxxxxx wrote:
From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>

Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
---
 t/perf/p0005-status.sh | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)
 create mode 100755 t/perf/p0005-status.sh

diff --git a/t/perf/p0005-status.sh b/t/perf/p0005-status.sh
new file mode 100755
index 0000000..704cebc
--- /dev/null
+++ b/t/perf/p0005-status.sh
@@ -0,0 +1,61 @@
+#!/bin/sh
+
+test_description="Tests performance of read-tree"
+
+. ./perf-lib.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+## usage: dir depth width files
+make_paths () {
+	for f in $(seq $4)
+	do
+		echo $1/file$f
+	done;
+	if test $2 -gt 0;
+	then
+		for w in $(seq $3)
+		do
+			make_paths $1/dir$w $(($2 - 1)) $3 $4
+		done
+	fi
+	return 0
+}
+
+fill_index () {
+	make_paths $1 $2 $3 $4 |
+	sed "s/^/100644 $EMPTY_BLOB	/" |
+	git update-index --index-info
+	return 0
+}
+
+br_work1=xxx_work1_xxx
+dir_new=xxx_dir_xxx
+
+## (5, 10, 9) will create 999,999 files.
+## (4, 10, 9) will create  99,999 files.
+depth=5
+width=10
+files=9
+
+## Inflate the index with thousands of empty files and commit it.
+## Use reset to actually populate the worktree.
+test_expect_success 'inflate the index' '
+	git reset --hard &&
+	git branch $br_work1 &&
+	git checkout $br_work1 &&
+	fill_index $dir_new $depth $width $files &&
+	git commit -m $br_work1 &&
+	git reset --hard
+'
+
+## The number of files in the branch.
+nr_work1=$(git ls-files | wc -l)

The above seems to be repeated (or at least very similar to what you
have in your other series [1].  Especially in this perf test wouldn't
it be better just use test_perf_large_repo, and let whoever runs the
test decide what constitutes a large repository for them?

The other advantage of that would be that it is more of a real-world
scenario, instead of a synthetic distribution of the files, which
would give us some better results I think.

Is there anything I'm missing that would make using
test_perf_large_repo not a good option here?

Yes, it is copied from the other series.  I make the same change
that Rene just suggested on it to use awk to create the list.

I did this because I need very large repos.  From what I can tell
the common usage is to set test_perf_large_repo to linux.git, but
that only has 58K files.  And it defaults to git.git which only
has 3K files.

Internally, I test against the Windows source tree with 3.1M files,
but I can't share that :-)

So I created this test to generate artificial, but large and
reproducible repos for evaluation.

I could change the default depth to 4 (giving a 100K tree), but
I'm really interested in 1M+ repos.  For small-ish values of n
the difference between O(n) and O(n log n) operations can hide
in system and I/O noise; not so for very large n....


[1]: http://public-inbox.org/git/20170406163442.36463-3-git@xxxxxxxxxxxxxxxxx/

+test_perf "read-tree status work1 ($nr_work1)" '
+	git read-tree HEAD &&
+	git status
+'
+
+test_done
--
2.9.3




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]