Re: [PATCH 9/9] git-svn: Make fetch ~1.7x faster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Adam Roben <aroben@xxxxxxxxx> wrote:
> We were spending a lot of time forking/execing git-cat-file and
> git-hash-object. We now use command_bidi_pipe to keep one instance of each
> running and feed it input on stdin.

Nice job!  I just got access to a very fast SVN repository for a project
I'm working on (not working on git-svn itself, unfortunately).

A few comments and small nitpicks below:

> Signed-off-by: Adam Roben <aroben@xxxxxxxxx>
> ---
>  git-svn.perl |   94 ++++++++++++++++++++++++++++++++++++++++++++-------------
>  1 files changed, 72 insertions(+), 22 deletions(-)

> +package Git::Commands;

Can this be a separate file, or a part of Git.pm?  I'm sure other
scripts can eventually use this and I've been meaning to split
git-svn.perl into separate files so it's easier to follow.

> +use vars qw/$_cat_blob_pid $_cat_blob_in $_cat_blob_out $_cat_blob_ctx $_cat_blob_separator
> +	    $_hash_object_pid $_hash_object_in $_hash_object_out $_hash_object_ctx/;

I have trouble following long lines, and most of the git code also wraps
at 80-columns.  Dead-tree publishers got this concept right a long
time ago :)

> +use strict;
> +use warnings;
> +use File::Temp qw/tempfile/;
> +use Git qw/command_bidi_pipe command_close_bidi_pipe/;
> +
> +sub _open_cat_blob_if_needed {
> +	return if defined($_cat_blob_pid);
> +	$_cat_blob_separator = "--------------GITCATFILESEPARATOR-----------";

Brian brought this up already, but yes, having pre-defined separators
instead of explicitly-specified sizes makes it all too easy for a
malicious user to commit code that will break things for git-svn users.

> +sub hash_object {
> +	my (undef, $fh) = @_;
> +
> +	my ($tmp_fh, $tmp_filename) = tempfile(UNLINK => 1);
> +	while (my $line = <$fh>) {
> +		print $tmp_fh $line;
> +	}
> +	close($tmp_fh);

Related to the above.  It's better to sysread()/syswrite() or
read()/print() in a loop with a predefined buffer size rather than to
use a readline() since you could be dealing with files with very long
lines or binaries with no newline characters in them at all.

> +	_open_hash_object_if_needed();
> +	print $_hash_object_out $tmp_filename . "\n";

Minor, but

	print $_hash_object_out $tmp_filename, "\n";

avoids creating a new string.

-- 
Eric Wong
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux