Re: Git is not scalable with too many refs/*

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 27 Sep 2011 13:07:15 +0200, Michael Haggerty wrote:
On 09/27/2011 11:01 AM, Julian Phillips wrote:
It has to be hot-cache, otherwise time taken to read the refs from disk
will mean that it is always slow.  On my Mac it seems to _always_ be
slow reading the refs from disk, so even the "fast" case still takes ~17m.

This case should be helped by lazy-loading of loose references, which I am working on. So if you develop some benchmarking code, it would help
me with my work.

The attached script creates the repo structure I was testing with ...

If you create a repo with 100k refs it takes quite a while to read the refs from disk. If you are lazy-loading then it should take practically no time, since the only interesting ref is refs/heads/master.

The following is the hot-cache timing for "./refs-stress c 40000", with the sorting patch applied (wasn't prepared to wait for numbers with 100k refs).

jp3@rayne: refs>(cd c; time ~/misc/git/git/git branch)
* master

real    0m0.885s
user    0m0.161s
sys     0m0.722s

After doing "rm -rf c/.git/refs/changes/*", I get:

jp3@rayne: refs>(cd c; time ~/misc/git/git/git branch)
* master

real    0m0.004s
user    0m0.001s
sys     0m0.002s

--
Julian
#!/usr/bin/env python

import os
import random
import subprocess
import sys

def die(msg):
    print >> sys.stderr, msg
    sys.exit(1)

def new_ref(a, b, commit):
    d = ".git/refs/changes/%d/%d" % (a, b)
    if not os.path.exists(d):
        os.makedirs(d)
    e = 1
    p = "%s/%d" % (d, e)
    while os.path.exists(p):
        e += 1
        p = "%s/%d" % (d, e)
    f = open(p, "w")
    f.write(commit)
    f.close()

def make_refs(count, commit):
    while count > 0:
        sys.stdout.write("left: %d%s\r" % (count, " " * 30))
        a = random.randrange(10, 30)
        b = random.randrange(10000, 50000)
        new_ref(a, b, commit)
        count -= 1
    print "refs complete"

def main():
    if len(sys.argv) != 3:
        die("usage: %s <name> <ref count>" % sys.argv[0])

    _, name, refs = sys.argv

    os.mkdir(name)
    os.chdir(name)

    if subprocess.call(["git", "init"]) != 0:
        die("failed to init repo")

    f = open("foobar.txt", "w")
    f.write("%s: %s refs\n" % (name, refs))
    f.close()

    if subprocess.call(["git", "add", "foobar.txt"]) != 0:
        die("failed to add foobar.txt")

    if subprocess.call(["git", "commit", "-m", "inital commit"]) != 0:
        die("failed to create initial commit")

    commit = subprocess.check_output(["git", "show-ref", "-s", "master"]).strip()

    make_refs(int(refs), commit)

if __name__ == "__main__":
    main()

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]