Re: [PATCH v3] refs: Use binary search to lookup refs faster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/30/2011 01:48 AM, Junio C Hamano wrote:
> This version looks sane, although I have a suspicion that it may have
> some interaction with what Michael may be working on.

Indeed, I have almost equivalent changes in the giant patch series that
I am working on [1].  The branch is very experimental.  The tip
currently passes all the tests, but it has a known performance
regression in connection if "git fetch" is used to fetch many commits.


But before comparing ref-related optimizations, we have an *urgent* need
for a decent performance test suite.  There are many slightly different
scenarios that have very different performance characteristics, and we
have to be sure that we are optimizing for the whole palette of
many-reference use cases.  So I made an attempt at a kludgey but
somewhat flexible performance-testing script [2].  I don't know whether
something like this should be integrated into the git project, and if so
where; suggestions are welcome.


To run the tests, from the root of the git source tree:

    make # make sure git is up-to-date
    t/make-refperf-repo --help
    t/make-refperf-repo [OPTIONS]
    t/refperf
    cat refperf.times # See the results

The default repo has 5k commits in a linear series with one reference on
each commit.  (These numbers can both be adjusted.)

The reference namespace can be laid out a few ways:

* Many references in a single "directory" vs. sharded over many
"directories"

* In lexicographic order by commit, in reverse order, or "shuffled".

By default, the repo is written to "refperf-repo".

The time it takes to create the test repository is itself also an
interesting benchmark.  For example, on the maint branch it is terribly
slow unless it is passed either the --pack-refs-interval=N (with N, say
100) or --no-replace-object option.  I also noticed that if it is run like

    t/make-refperf-repo --refs=5000 --commits=5000 \
            --pack-refs-interval=100

(one ref per commit), git-pack-refs becomes precipitously and
dramatically slower after the 2000th commit.

I haven't had time yet for systematic benchmarks of other git versions.

See the refperf script to see what sorts of benchmarks that I have built
into it so far.  The refperf test is non-destructive; it always copies
from "refperf-repo" to "refperf-repo-copy" and does its tests in the
copy; therefore a test repo can be reused.  The timing data are written
to "refperf.times" and other output to "refperf.log".

Here are my refperf results for the "maint" branch on my notebook with
the default "make-refperf-repo" arguments (times in seconds):

3.36 git branch (cold)
0.01 git branch (warm)
0.04 git for-each-ref
3.08 git checkout (cold)
0.01 git checkout (warm)
0.00 git checkout --orphan (warm)
0.15 git checkout from detached orphan
0.12 git pack-refs
1.17 git branch (cold)
0.00 git branch (warm)
0.17 git for-each-ref
0.95 git checkout (cold)
0.00 git checkout (warm)
0.00 git checkout --orphan (warm)
0.21 git checkout from detached orphan
0.18 git branch -a --contains
7.67 git clone
0.06 git fetch (nothing)
0.01 git pack-refs
0.05 git fetch (nothing, packed)
0.10 git clone of a ref-packed repo
0.63 git fetch (everything)

Probably we should test with even more references than this, but this
test already shows that some commands are quite sluggish.

There are some more things that could be added, like:

* Branches vs. annotated tags

* References on the tips of branches in a more typical "branchy" repository.

* git describe --all

* git log --decorate

* git gc

* git filter-branch
  (This has very different performance characteristics because it is a
script that invokes git many times.)

I suggest that we try to do systematic benchmarking of any changes that
we claim are performance optimizations and share before/after results in
the cover letter for the patch series.

Michael

[1] branch hierarchical-refs at git://github.com/mhagger/git.git
[2] branch refperf at git://github.com/mhagger/git.git

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]