Re: performance problem: "git commit filename"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Sat, 12 Jan 2008, Linus Torvalds wrote:
> 
> HOWEVER. When that logic was converted from that shell-script into a 
> builtin-commit.c, that conversion was not done correctly. The old "git 
> read-tree -i -m" was not translated as a "unpack_trees()" call, but as 
> this in prepare_index():
> 
> 	discard_cache()
> 	..
> 	tree = parse_tree_indirect(head_sha1);
> 	..
> 	read_tree(tree, 0, NULL)
> 
> which is very wrong, because it replaces the old index entirely, and 
> doesn't do that stat information merging.

This patch may or may not fix it.

It makes builtin-commit.c use the same logic that "git read-tree -i -m" 
does (which is what the old shell script did), and it seems to pass the 
test-suite, and it looks pretty obvious.

It also brings down the number of open/mmap/munmap/close calls to where it 
should be, although it still does *way* too many "lstat()" operations (ie 
it does 4*lstat for each file in the index - one more than the 
non-filename one does).

With that fixed, performance is also roughly where it should be (ie the 
17-18s for the cold-cache case), because it no longer needs to rehash all 
the files!

HOWEVER. This was just a quick hack, and while it all looks sane, this is 
some damn core code. Somebody else should double- and triple-check this.

[ That 4x lstat thing bothers me. I think we should add a flag to the 
  index saying "we checked this once already, it's clean", so that if we 
  do multiple passes over the index, we can still do just a single lstat() 
  on just the first pass. But that's a separate issue.

  On Linux, a cached lstat() is almost free. Well, at least compared to 
  all the crap operating systems out there. And obviously, if you do 
  multiple lstat's per file, all but the first one *will* be cached.

  However, "almost free" still isn't zero, and with the kernel having 23k 
  files in it, doing almost a hundred thousand lstat's is still something 
  that only takes about half a second or so for me. We _really_ should do 
  only ~23k or so of them, and the cached cache should take on the order 
  of 0.15s, rather than half a second!

  So this is worth optimizing. With bigger repositories, it's going to be 
  more noticeable, and with other operating systems, all those lstat()'s 
  will cost much _much_ more. Of course, any IO overhead will be much 
  bigger, so this is mostly a cached-case issue, but cached-case is still 
  important.. ]

Anyway, consider this being conditionally signed-off-by: me, assuming 
a few other people spend a bit of time double-checking all my logic.

Please?

			Linus

---
 builtin-commit.c |   37 ++++++++++++++++++++++++++++---------
 1 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/builtin-commit.c b/builtin-commit.c
index 73f1e35..cc5134e 100644
--- a/builtin-commit.c
+++ b/builtin-commit.c
@@ -21,6 +21,7 @@
 #include "utf8.h"
 #include "parse-options.h"
 #include "path-list.h"
+#include "unpack-trees.h"
 
 static const char * const builtin_commit_usage[] = {
 	"git-commit [options] [--] <filepattern>...",
@@ -177,10 +178,34 @@ static void add_remove_files(struct path_list *list)
 	}
 }
 
+static void create_base_index(void)
+{
+	struct tree *tree;
+	struct unpack_trees_options opts;
+	struct tree_desc t;
+
+	if (initial_commit) {
+		discard_cache();
+		return;
+	}
+
+	memset(&opts, 0, sizeof(opts));
+	opts.head_idx = 1;
+	opts.index_only = 1;
+	opts.merge = 1;
+	
+	opts.fn = oneway_merge;
+	tree = parse_tree_indirect(head_sha1);
+	if (!tree)
+		die("failed to unpack HEAD tree object");
+	parse_tree(tree);
+	init_tree_desc(&t, tree->buffer, tree->size);
+	unpack_trees(1, &t, &opts);
+}
+
 static char *prepare_index(int argc, const char **argv, const char *prefix)
 {
 	int fd;
-	struct tree *tree;
 	struct path_list partial;
 	const char **pathspec = NULL;
 
@@ -278,14 +303,8 @@ static char *prepare_index(int argc, const char **argv, const char *prefix)
 
 	fd = hold_lock_file_for_update(&false_lock,
 				       git_path("next-index-%d", getpid()), 1);
-	discard_cache();
-	if (!initial_commit) {
-		tree = parse_tree_indirect(head_sha1);
-		if (!tree)
-			die("failed to unpack HEAD tree object");
-		if (read_tree(tree, 0, NULL))
-			die("failed to read HEAD tree object");
-	}
+
+	create_base_index();
 	add_remove_files(&partial);
 	refresh_cache(REFRESH_QUIET);
 
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux