Re: [PATCH][RFC] Add git-archive-tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano schrieb:
> Rene Scharfe <rene.scharfe@xxxxxxxxxxxxxx> writes:
> 
>> Currently git-archive-tree -f tar is slower than git-tar-tree.  This is
>> because it is welded to the side of the existing code to minimize patch
>> size, and I also suspect read_tree_recursive() to be quite a bit slower
>> than builtin-tar-tree.c::traverse_tree().
> 
> Yes, I suspect "struct object" and friends are very inefficient
> to use for things like this.  "struct tree_desc" based traverser
> is much preferred.

Turns out the reason for git-archive-tree -f tar being 10% slower than
git-tar-tree was a stupid memory leak in write_tar_entry(). *blush*
It caused lots of brk() calls (i.e. system calls).

In order to simplify measurement, I commented out the body of
write_entry(), which both git-archive-tree and git-tar-tree are calling
to write their output.  The rest left is basically two pure tree
traversers, git-archive-tree using read_tree_recursive() and git-tar-tree
using its struct tree_desc based traverse_tree().

I then let the two chew away on the kernel repository.  And as
kcachegrind impressively shows, all we do with our trees and objects is
dwarfed by inflate().  In both cases more than 96.6% of the cost lies
within libz.  That's not too surprising, because archivers need to
decompress _all_ objects, not only trees.

So for git-archive we can pretty much chose which traverser to use based
on convenience.

As a second experiment I wrote a struct tree_desc based traverser plus a
matching read_tree_recursive() compatibility wrapper (included below for
reference, not for inclusion) and compared the performance of
'git-ls-tree -r -t' on the kernel repository with and without it.  The
result is that the relative cost of all functions from tree.c combined
decreased from 0.93% to 0.66%.  Ugh.

So while a struct tree_desc based traverser can be significantly faster
than read_tree_recursive(), as soon as you actually start to do something
to the trees that difference pales to insignificance.

René



diff --git a/tree.c b/tree.c
index ea386e5..977a4aa 100644
--- a/tree.c
+++ b/tree.c
@@ -4,6 +4,7 @@ #include "blob.h"
 #include "commit.h"
 #include "tag.h"
 #include "tree-walk.h"
+#include "strbuf.h"
 #include <stdlib.h>
 
 const char *tree_type = "tree";
@@ -227,3 +228,99 @@ struct tree *parse_tree_indirect(const u
 			parse_object(obj->sha1);
 	} while (1);
 }
+
+static int do_read_tree_recursive_light(struct tree_desc *desc,
+                                        struct strbuf *base,
+                                        const char **match, read_tree_fn_t fn)
+{
+	struct name_entry entry;
+	int err = 0;
+	int baselen = base->len;
+
+	while (tree_entry(desc, &entry)) {
+		if (!match_tree_entry(base->buf, base->len, entry.path, entry.mode, match))
+			continue;
+
+		err = fn(entry.sha1, base->buf, base->len, entry.path, entry.mode, 0);
+		switch (err) {
+		case 0:
+			continue;
+		case READ_TREE_RECURSIVE:
+			break;
+		default:
+			return -1;
+		}
+
+		if (S_ISDIR(entry.mode)) {
+			struct tree_desc subtree;
+			char type[20];
+			void *buf;
+			int newbaselen;
+
+			buf = read_sha1_file(entry.sha1, type, &subtree.size);
+			if (!buf)
+				return error("cannot read %s",
+				             sha1_to_hex(entry.sha1));
+			if (strcmp(type, tree_type)) {
+				free(buf);
+				return error("Object %s not a tree",
+				             sha1_to_hex(entry.sha1));
+			}
+			subtree.buf = buf;
+
+			newbaselen = baselen + entry.pathlen + 1;
+			if (newbaselen > base->alloc) {
+				base->buf = xrealloc(base->buf, newbaselen);
+				base->alloc = newbaselen;
+			}
+			memcpy(base->buf + baselen, entry.path, entry.pathlen);
+			base->buf[baselen + entry.pathlen] = '/';
+			base->len = newbaselen;
+
+			err = do_read_tree_recursive_light(&subtree,
+			                                   base,
+			                                   match, fn);
+			base->len = baselen;
+			free(buf);
+			if (err)
+				break;
+		}
+	}
+
+	return err;
+}
+
+int read_tree_recursive_light(struct tree *tree,
+                              const char *base, int baselen, int stage,
+                              const char **match, read_tree_fn_t fn)
+{
+	unsigned char *sha1 = tree->object.sha1;
+	struct tree_desc desc;
+	char type[20];
+	void *buf;
+	int err;
+	struct strbuf sb;
+
+	sb.buf = xmalloc(PATH_MAX);
+	sb.alloc = PATH_MAX;
+	sb.len = 0;
+	if (baselen > sb.alloc) {
+		sb.buf = xrealloc(sb.buf, baselen);
+		sb.alloc = baselen;
+	}
+	memcpy(sb.buf, base, baselen);
+	sb.len = baselen;
+
+	desc.buf = buf = read_sha1_file(sha1, type, &desc.size);
+	if (!buf)
+		return error("Could not read %s", sha1_to_hex(sha1));
+	if (strcmp(type, tree_type)) {
+		free(buf);
+		return error("Object %s not a tree", sha1_to_hex(sha1));
+	}
+
+	err = do_read_tree_recursive_light(&desc, &sb, match, fn);
+	free(buf);
+
+	return err;
+}
diff --git a/tree.h b/tree.h
index dd25c53..2294bc2 100644
--- a/tree.h
+++ b/tree.h
@@ -30,4 +30,8 @@ extern int read_tree_recursive(struct tr
 
 extern int read_tree(struct tree *tree, int stage, const char **paths);
 
+extern int read_tree_recursive_light(struct tree *tree,
+                                     const char *base, int baselen, int stage,
+                                     const char **match, read_tree_fn_t fn);
+
 #endif /* TREE_H */
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]