With this patch we have a speedup and much lower IO when importing trees with many branches. Instead of forcing index re-population for each branch switch, we keep many index files around, one per branch. Signed-off-by: Martin Langhoff <martin@xxxxxxxxxxxxxxx> --- This patch should get some review. It is trivial, but not fully tested. I am testing it on the moz repo (which will take a while) to check that I get the same result with and without it. Performance-wise, it seems to be doing ~15K commits per hour, with the mozilla repo, up from ~6Kcph on the same hardware. Of course, this is only noticeable in projects with lots of concurrent branches. Linear projects don't get much from this patch. With this change, we are now truly waiting on cvs to hand over the files pronto! Running locally, it is apparent that it isn't IO wait but the latency of the chatty cvs protocol that is making this slow. Probably forking 2 or 3 processes to prefetch filerevs from cvs and put them in a queue directory for the main process to pick up would work wonders. Actually, they could call git-hash-object and just put some file metadata in the queue directory. --- git-cvsimport.perl | 37 ++++++++++++++++++++++++++++++------- 1 files changed, 30 insertions(+), 7 deletions(-) diff --git a/git-cvsimport.perl b/git-cvsimport.perl old mode 100755 new mode 100644 index 76f6246..9c4588f --- a/git-cvsimport.perl +++ b/git-cvsimport.perl @@ -465,10 +465,15 @@ my $git_dir = $ENV{"GIT_DIR"} || ".git"; $ENV{"GIT_DIR"} = $git_dir; my $orig_git_index; $orig_git_index = $ENV{GIT_INDEX_FILE} if exists $ENV{GIT_INDEX_FILE}; -my ($git_ih, $git_index) = tempfile('gitXXXXXX', SUFFIX => '.idx', - DIR => File::Spec->tmpdir()); -close ($git_ih); -$ENV{GIT_INDEX_FILE} = $git_index; + +my %index; # holds filenames of one index per branch +{ # init with an index for origin + my ($fh, $fn) = tempfile('gitXXXXXX', SUFFIX => '.idx', + DIR => File::Spec->tmpdir()); + close ($fh); + $index{$opt_o} = $fn; +} +$ENV{GIT_INDEX_FILE} = $index{$opt_o}; unless(-d $git_dir) { system("git-init-db"); die "Cannot init the GIT db at $git_tree: $?\n" if $?; @@ -496,6 +501,13 @@ unless(-d $git_dir) { $tip_at_start = `git-rev-parse --verify HEAD`; # populate index + unless ($index{$last_branch}) { + my ($fh, $fn) = tempfile('gitXXXXXX', SUFFIX => '.idx', + DIR => File::Spec->tmpdir()); + close ($fh); + $index{$last_branch} = $fn; + } + $ENV{GIT_INDEX_FILE} = $index{$last_branch}; system('git-read-tree', $last_branch); die "read-tree failed: $?\n" if $?; @@ -776,8 +788,17 @@ while(<CVS>) { } if(($ancestor || $branch) ne $last_branch) { print "Switching from $last_branch to $branch\n" if $opt_v; - system("git-read-tree", $branch); - die "read-tree failed: $?\n" if $?; + unless ($index{$branch}) { + my ($fh, $fn) = tempfile('gitXXXXXX', SUFFIX => '.idx', + DIR => File::Spec->tmpdir()); + close ($fh); + $index{$branch} = $fn; + $ENV{GIT_INDEX_FILE} = $index{$branch}; + system("git-read-tree", $branch); + die "read-tree failed: $?\n" if $?; + } else { + $ENV{GIT_INDEX_FILE} = $index{$branch}; + } } $last_branch = $branch if $branch ne $last_branch; $state = 9; @@ -841,7 +862,9 @@ # VERSION:1.96->1.96.2.1 } commit() if $branch and $state != 11; -unlink($git_index); +foreach my $git_index (values %index) { + unlink($git_index); +} if (defined $orig_git_index) { $ENV{GIT_INDEX_FILE} = $orig_git_index; -- 1.4.0.g5fba - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html