Junio C Hamano schrieb: > Rene Scharfe <rene.scharfe@xxxxxxxxxxxxxx> writes: > >> Currently git-archive-tree -f tar is slower than git-tar-tree. This is >> because it is welded to the side of the existing code to minimize patch >> size, and I also suspect read_tree_recursive() to be quite a bit slower >> than builtin-tar-tree.c::traverse_tree(). > > Yes, I suspect "struct object" and friends are very inefficient > to use for things like this. "struct tree_desc" based traverser > is much preferred. Turns out the reason for git-archive-tree -f tar being 10% slower than git-tar-tree was a stupid memory leak in write_tar_entry(). *blush* It caused lots of brk() calls (i.e. system calls). In order to simplify measurement, I commented out the body of write_entry(), which both git-archive-tree and git-tar-tree are calling to write their output. The rest left is basically two pure tree traversers, git-archive-tree using read_tree_recursive() and git-tar-tree using its struct tree_desc based traverse_tree(). I then let the two chew away on the kernel repository. And as kcachegrind impressively shows, all we do with our trees and objects is dwarfed by inflate(). In both cases more than 96.6% of the cost lies within libz. That's not too surprising, because archivers need to decompress _all_ objects, not only trees. So for git-archive we can pretty much chose which traverser to use based on convenience. As a second experiment I wrote a struct tree_desc based traverser plus a matching read_tree_recursive() compatibility wrapper (included below for reference, not for inclusion) and compared the performance of 'git-ls-tree -r -t' on the kernel repository with and without it. The result is that the relative cost of all functions from tree.c combined decreased from 0.93% to 0.66%. Ugh. So while a struct tree_desc based traverser can be significantly faster than read_tree_recursive(), as soon as you actually start to do something to the trees that difference pales to insignificance. René diff --git a/tree.c b/tree.c index ea386e5..977a4aa 100644 --- a/tree.c +++ b/tree.c @@ -4,6 +4,7 @@ #include "blob.h" #include "commit.h" #include "tag.h" #include "tree-walk.h" +#include "strbuf.h" #include <stdlib.h> const char *tree_type = "tree"; @@ -227,3 +228,99 @@ struct tree *parse_tree_indirect(const u parse_object(obj->sha1); } while (1); } + +static int do_read_tree_recursive_light(struct tree_desc *desc, + struct strbuf *base, + const char **match, read_tree_fn_t fn) +{ + struct name_entry entry; + int err = 0; + int baselen = base->len; + + while (tree_entry(desc, &entry)) { + if (!match_tree_entry(base->buf, base->len, entry.path, entry.mode, match)) + continue; + + err = fn(entry.sha1, base->buf, base->len, entry.path, entry.mode, 0); + switch (err) { + case 0: + continue; + case READ_TREE_RECURSIVE: + break; + default: + return -1; + } + + if (S_ISDIR(entry.mode)) { + struct tree_desc subtree; + char type[20]; + void *buf; + int newbaselen; + + buf = read_sha1_file(entry.sha1, type, &subtree.size); + if (!buf) + return error("cannot read %s", + sha1_to_hex(entry.sha1)); + if (strcmp(type, tree_type)) { + free(buf); + return error("Object %s not a tree", + sha1_to_hex(entry.sha1)); + } + subtree.buf = buf; + + newbaselen = baselen + entry.pathlen + 1; + if (newbaselen > base->alloc) { + base->buf = xrealloc(base->buf, newbaselen); + base->alloc = newbaselen; + } + memcpy(base->buf + baselen, entry.path, entry.pathlen); + base->buf[baselen + entry.pathlen] = '/'; + base->len = newbaselen; + + err = do_read_tree_recursive_light(&subtree, + base, + match, fn); + base->len = baselen; + free(buf); + if (err) + break; + } + } + + return err; +} + +int read_tree_recursive_light(struct tree *tree, + const char *base, int baselen, int stage, + const char **match, read_tree_fn_t fn) +{ + unsigned char *sha1 = tree->object.sha1; + struct tree_desc desc; + char type[20]; + void *buf; + int err; + struct strbuf sb; + + sb.buf = xmalloc(PATH_MAX); + sb.alloc = PATH_MAX; + sb.len = 0; + if (baselen > sb.alloc) { + sb.buf = xrealloc(sb.buf, baselen); + sb.alloc = baselen; + } + memcpy(sb.buf, base, baselen); + sb.len = baselen; + + desc.buf = buf = read_sha1_file(sha1, type, &desc.size); + if (!buf) + return error("Could not read %s", sha1_to_hex(sha1)); + if (strcmp(type, tree_type)) { + free(buf); + return error("Object %s not a tree", sha1_to_hex(sha1)); + } + + err = do_read_tree_recursive_light(&desc, &sb, match, fn); + free(buf); + + return err; +} diff --git a/tree.h b/tree.h index dd25c53..2294bc2 100644 --- a/tree.h +++ b/tree.h @@ -30,4 +30,8 @@ extern int read_tree_recursive(struct tr extern int read_tree(struct tree *tree, int stage, const char **paths); +extern int read_tree_recursive_light(struct tree *tree, + const char *base, int baselen, int stage, + const char **match, read_tree_fn_t fn); + #endif /* TREE_H */ - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html