I was investigating replacing an existing subversion setup with git, and was mostly pleased with the results - until it came to trying to update a clone ... which took very much longer than the original clone. An artifical test repository that has similar features (~25000 commits, ~8000 tags, ~900 branches and a 2.5Gb packfile) when running locally takes ~20m to clone and ~48m to fetch (with no new commits in the original repository - i.e. the fetch does not update anything) with a current code base (i.e. newer than 1.5.0-rc4). As a side note, performance was actually better with an older version - packed refs makes things quite a bit worse (clone was only ~30m with 1.4 IIRC). Investigation showed that the main culprit seemed to be show-ref having to build a sorted list of all refs for every ref that was being checked. So I used the patch below to reduce this to a single call to show-ref (unless the ref had been updated). With this patch the fetch timed dropped to just under 1m - obviously quite a lot faster (better than I expected in fact). However, this seems more band-aid than fix, and I wondered if someone more familiar with the git internals could point me in the right direction for a better fix, e.g. should I look at rewriting fetch in C? diff --git a/Makefile b/Makefile index 5d31e6d..6baf043 100644 --- a/Makefile +++ b/Makefile @@ -120,7 +120,7 @@ ALL_CFLAGS = $(CFLAGS) ALL_LDFLAGS = $(LDFLAGS) STRIP ?= strip -prefix = $(HOME) +prefix = $(HOME)/git bindir = $(prefix)/bin gitexecdir = $(bindir) template_dir = $(prefix)/share/git-core/templates/ @@ -188,7 +188,7 @@ SCRIPT_PERL = \ SCRIPTS = $(patsubst %.sh,%,$(SCRIPT_SH)) \ $(patsubst %.perl,%,$(SCRIPT_PERL)) \ - git-cherry-pick git-status git-instaweb + git-cherry-pick git-status git-instaweb git-ref-diff.py # ... and all the rest that could be moved out of bindir to gitexecdir PROGRAMS = \ diff --git a/git-fetch.sh b/git-fetch.sh index 357cac2..ce135a5 100755 --- a/git-fetch.sh +++ b/git-fetch.sh @@ -108,11 +108,12 @@ ls_remote_result=$(git ls-remote $exec "$remote") || append_fetch_head () { head_="$1" - remote_="$2" - remote_name_="$3" - remote_nick_="$4" - local_name_="$5" - case "$6" in + local_head_="$2" + remote_="$3" + remote_name_="$4" + remote_nick_="$5" + local_name_="$6" + case "$7" in t) not_for_merge_='not-for-merge' ;; '') not_for_merge_= ;; esac @@ -151,10 +152,15 @@ append_fetch_head () { echo "$head_ not-for-merge $note_" >>"$GIT_DIR/FETCH_HEAD" fi - update_local_ref "$local_name_" "$head_" "$note_" + update_local_ref "$local_name_" "$head_" "$note_" "$local_head_" } update_local_ref () { + if [ "$2" == "$4" ]; then + [ "$verbose" ] && echo >&2 "* $1: same as $3" + return 0 + fi + # If we are storing the head locally make sure that it is # a fast forward (aka "reverse push"). @@ -392,7 +398,7 @@ fetch_main () { ( git-fetch-pack --thin $exec $keep $shallow_depth "$remote" $rref || echo failed "$remote" - ) | + ) | git-ref-diff.py "$reflist" | ( trap ' if test -n "$keepfile" && test -f "$keepfile" @@ -402,7 +408,7 @@ fetch_main () { ' 0 keepfile= - while read sha1 remote_name + while read sha1 remote_name local_sha1 do case "$sha1" in failed) @@ -441,7 +447,7 @@ fetch_main () { esac done local_name=$(expr "z$found" : 'z[^:]*:\(.*\)') - append_fetch_head "$sha1" "$remote" \ + append_fetch_head "$sha1" "$local_sha1" "$remote" \ "$remote_name" "$remote_nick" "$local_name" \ "$not_for_merge" || exit done diff --git a/git-ref-diff.py b/git-ref-diff.py new file mode 100755 index 0000000..2b30e4c --- /dev/null +++ b/git-ref-diff.py @@ -0,0 +1,33 @@ +#!/usr/bin/python + +import os +import re +import sys + +ref_map_re = re.compile("^\.?\+?(?P<remote>.*?):(?P<local>.*)$") + +refs = {} +refsp = os.popen("git-show-ref") +for ref in refsp.readlines(): + (sha, ref) = ref.strip().split(' ') + refs[ref] = sha +refsp.close() + +ref_map = {} +for line in sys.argv[1].split('\n'): + ref_map_m = ref_map_re.search(line) + if ref_map_m: + remote = ref_map_m.group('remote') + local = ref_map_m.group('local') + ref_map[remote] = local + +while True: + try: + (sha, ref) = raw_input().split(' ') + except EOFError: + sys.exit(0) + lref = ref_map.get(ref, None) + if refs.has_key(lref): + print "%s %s %s" % (sha, ref, refs[lref]) + else: + print "%s %s -" % (sha, ref) -- Julian --- Why bother building any more nuclear warheads until we use the ones we have? - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html