[PATCH 2/2] fast-export: do not load blob objects twice

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When fast-export wants to export a blob object, it first
calls parse_object to get a "struct object" and check
whether we have already shown the object.  If we haven't
shown it, we then use read_sha1_file to pull it from disk
and write it out.

That means we load each blob from disk twice: once for
parse_object to find its type and check its sha1, and a
second time when we actually output it. We can drop this to
a single load by using lookup_object to check the SHOWN
flag, and then checking the signature on and outputting a
single buffer.

This provides modest speedups on git.git (best-of-five, "git
fast-export HEAD >/dev/null"):

  [before]                [after]
  real    0m14.347s       real    0m13.780s
  user    0m14.084s       user    0m13.620s
  sys     0m0.208s        sys     0m0.100s

and somewhat more on more blob-heavy repos (this is a
repository full of media files):

  [before]                [after]
  real    0m52.236s       real    0m44.451s
  user    0m50.568s       user    0m43.000s
  sys     0m1.536s        sys     0m1.284s

Signed-off-by: Jeff King <peff@xxxxxxxx>
---
We actually spend a non-trivial amount of time re-checking the sha1 of
objects we are loading. This change also makes it easy to drop that
checking, though perhaps the additional safety is a good thing to have
during an export. The timings without it are:

  git.git (was 14.347s)
  real    0m11.452s
  user    0m11.336s
  sys     0m0.072s

  photos (was 44.451s)
  real    0m18.383s
  user    0m17.108s
  sys     0m1.224s

 builtin/fast-export.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 3eba852..d380155 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -119,6 +119,7 @@ static void export_blob(const unsigned char *sha1)
 	enum object_type type;
 	char *buf;
 	struct object *object;
+	int eaten;
 
 	if (no_data)
 		return;
@@ -126,16 +127,18 @@ static void export_blob(const unsigned char *sha1)
 	if (is_null_sha1(sha1))
 		return;
 
-	object = parse_object(sha1);
-	if (!object)
-		die ("Could not read blob %s", sha1_to_hex(sha1));
-
-	if (object->flags & SHOWN)
+	object = lookup_object(sha1);
+	if (object && object->flags & SHOWN)
 		return;
 
 	buf = read_sha1_file(sha1, &type, &size);
 	if (!buf)
 		die ("Could not read blob %s", sha1_to_hex(sha1));
+	if (check_sha1_signature(sha1, buf, size, typename(type)) < 0)
+		die("sha1 mismatch in blob %s", sha1_to_hex(sha1));
+	object = parse_object_buffer(sha1, type, size, buf, &eaten);
+	if (!object)
+		die("Could not read blob %s", sha1_to_hex(sha1));
 
 	mark_next_object(object);
 
@@ -147,7 +150,8 @@ static void export_blob(const unsigned char *sha1)
 	show_progress();
 
 	object->flags |= SHOWN;
-	free(buf);
+	if (!eaten)
+		free(buf);
 }
 
 static int depth_first(const void *a_, const void *b_)
-- 
1.8.2.rc2.7.gef06216
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]