[RFC] Speeding up a null fetch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I was investigating replacing an existing subversion setup with git, and
was mostly pleased with the results - until it came to trying to update a
clone ... which took very much longer than the original clone.

An artifical test repository that has similar features (~25000 commits,
~8000 tags, ~900 branches and a 2.5Gb packfile) when running locally
takes ~20m to clone and ~48m to fetch (with no new commits in the
original repository - i.e. the fetch does not update anything) with a
current code base (i.e. newer than 1.5.0-rc4).  As a side note,
performance was actually better with an older version - packed refs
makes things quite a bit worse (clone was only ~30m with 1.4 IIRC).

Investigation showed that the main culprit seemed to be show-ref
having to build a sorted list of all refs for every ref that was being
checked.  So I used the patch below to reduce this to a single call to
show-ref (unless the ref had been updated).  With this patch the fetch
timed dropped to just under 1m - obviously quite a lot faster (better
than I expected in fact).

However, this seems more band-aid than fix, and I wondered if someone
more familiar with the git internals could point me in the right
direction for a better fix, e.g. should I look at rewriting fetch in C?

diff --git a/Makefile b/Makefile
index 5d31e6d..6baf043 100644
--- a/Makefile
+++ b/Makefile
@@ -120,7 +120,7 @@ ALL_CFLAGS = $(CFLAGS)
 ALL_LDFLAGS = $(LDFLAGS)
 STRIP ?= strip
 
-prefix = $(HOME)
+prefix = $(HOME)/git
 bindir = $(prefix)/bin
 gitexecdir = $(bindir)
 template_dir = $(prefix)/share/git-core/templates/
@@ -188,7 +188,7 @@ SCRIPT_PERL = \
 
 SCRIPTS = $(patsubst %.sh,%,$(SCRIPT_SH)) \
 	  $(patsubst %.perl,%,$(SCRIPT_PERL)) \
-	  git-cherry-pick git-status git-instaweb
+	  git-cherry-pick git-status git-instaweb git-ref-diff.py
 
 # ... and all the rest that could be moved out of bindir to gitexecdir
 PROGRAMS = \
diff --git a/git-fetch.sh b/git-fetch.sh
index 357cac2..ce135a5 100755
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -108,11 +108,12 @@ ls_remote_result=$(git ls-remote $exec "$remote") ||
 
 append_fetch_head () {
     head_="$1"
-    remote_="$2"
-    remote_name_="$3"
-    remote_nick_="$4"
-    local_name_="$5"
-    case "$6" in
+    local_head_="$2"
+    remote_="$3"
+    remote_name_="$4"
+    remote_nick_="$5"
+    local_name_="$6"
+    case "$7" in
     t) not_for_merge_='not-for-merge' ;;
     '') not_for_merge_= ;;
     esac
@@ -151,10 +152,15 @@ append_fetch_head () {
 	echo "$head_	not-for-merge	$note_" >>"$GIT_DIR/FETCH_HEAD"
     fi
 
-    update_local_ref "$local_name_" "$head_" "$note_"
+    update_local_ref "$local_name_" "$head_" "$note_" "$local_head_"
 }
 
 update_local_ref () {
+    if [ "$2" == "$4" ]; then
+	[ "$verbose" ] && echo >&2 "* $1: same as $3"
+	return 0
+    fi
+
     # If we are storing the head locally make sure that it is
     # a fast forward (aka "reverse push").
 
@@ -392,7 +398,7 @@ fetch_main () {
       (
 	  git-fetch-pack --thin $exec $keep $shallow_depth "$remote" $rref ||
 	  echo failed "$remote"
-      ) |
+      ) | git-ref-diff.py "$reflist" |
       (
 	trap '
 		if test -n "$keepfile" && test -f "$keepfile"
@@ -402,7 +408,7 @@ fetch_main () {
 	' 0
 
         keepfile=
-	while read sha1 remote_name
+	while read sha1 remote_name local_sha1
 	do
 	  case "$sha1" in
 	  failed)
@@ -441,7 +447,7 @@ fetch_main () {
 	      esac
 	  done
 	  local_name=$(expr "z$found" : 'z[^:]*:\(.*\)')
-	  append_fetch_head "$sha1" "$remote" \
+	  append_fetch_head "$sha1" "$local_sha1" "$remote" \
 		  "$remote_name" "$remote_nick" "$local_name" \
 		  "$not_for_merge" || exit
         done
diff --git a/git-ref-diff.py b/git-ref-diff.py
new file mode 100755
index 0000000..2b30e4c
--- /dev/null
+++ b/git-ref-diff.py
@@ -0,0 +1,33 @@
+#!/usr/bin/python
+
+import os
+import re
+import sys
+
+ref_map_re = re.compile("^\.?\+?(?P<remote>.*?):(?P<local>.*)$")
+
+refs = {}
+refsp = os.popen("git-show-ref")
+for ref in refsp.readlines():
+    (sha, ref) = ref.strip().split(' ')
+    refs[ref] = sha
+refsp.close()
+
+ref_map = {}
+for line in sys.argv[1].split('\n'):
+    ref_map_m = ref_map_re.search(line)
+    if ref_map_m:
+        remote = ref_map_m.group('remote')
+        local = ref_map_m.group('local')
+        ref_map[remote] = local
+
+while True:
+    try:
+        (sha, ref) = raw_input().split(' ')
+    except EOFError:
+        sys.exit(0)
+    lref = ref_map.get(ref, None)
+    if refs.has_key(lref):
+        print "%s %s %s" % (sha, ref, refs[lref])
+    else:
+        print "%s %s -" % (sha, ref)

-- 
Julian

 --- 
Why bother building any more nuclear warheads until we use the ones we have?
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]