Re: [PATCH] git-svn: Make it scream by minimizing temp files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Marcus Griep <marcus@xxxxxxxx> wrote:
> Currently, git-svn would create a temp file on four occasions:
> 1. Reading a blob out of the object db
> 2. Creating a delta from svn
> 3. Hashing and writing a blob into the object db
> 4. Reading a blob out of the object db (in another place in code)
> 
> Any time git-svn did the above, it would dutifully create and then
> delete said temp file.  Unfortunately, this means that between 2-4
> temporary files are created/deleted per file 'add/modify'-ed in
> svn (O(n)).  This causes significant overhead and helps the inode
> counter to spin beautifully.
> 
> By its nature, git-svn is a serial beast.  Thus, reusing a temp file
> does not pose significant problems.  "truncate and seek" takes much
> less time than "unlink and create".  This patch centralizes the
> tempfile creation and holds onto the tempfile until they are deleted
> on exit.  This significantly reduces file overhead, now requiring
> at most three (3) temp files per run (O(1)).

Wow.  I've considered this in the past didn't think there would be a
significant difference (of course I'm always network I/O bound).  Which
platform and filesystem are you using are you using for tests?

I don't notice any difference running the test suite on Linux + ext3
here, but the test suite is not a good benchmark :)

> --- a/git-svn.perl
> +++ b/git-svn.perl
> @@ -1282,6 +1282,8 @@ use Carp qw/croak/;
>  use File::Path qw/mkpath/;
>  use File::Copy qw/copy/;
>  use IPC::Open3;
> +use File::Temp qw/ :seekable /;

qw/ :seekable / does not appear in my version of Perl (5.8.8-7etch3 from
Debian stable)  Just having "use File::Temp;" there works for me.

>  sub resolve_local_globs {
> @@ -2932,6 +2935,23 @@ sub remove_username {
>  	$_[0] =~ s{^([^:]*://)[^@]+@}{$1};
>  }
>  
> +sub _temp_file {
> +	my ($self, $fd, $autoflush) = @_;
> +	if (defined $TEMP_FILES{$fd}) {
> +		truncate $TEMP_FILES{$fd}, 0 or croak $!;
> +		seek $TEMP_FILES{$fd}, 0, 0 or croak $!;

Perhaps a sysseek in addition to the seek above would help
with the problems you mentioned in the other email.

		sysseek $TEMP_FILES{$fd}, 0, 0 or croak $!;

(It doesn't seem to affect me when running the test suite, though).

> +	} else {
> +		$TEMP_FILES{$fd} = File::Temp->new(
> +									TEMPLATE => 'GitSvn_XXXXXX',
> +									DIR => File::Spec->tmpdir
> +									) or croak $!;

Way too much indentation :x

> +		if (defined $autoflush) {
> +			$TEMP_FILES{$fd}->autoflush($autoflush);
> +		}

Given how much we interact with external programs, I'd rather force
every autoflush on every file handle to avoid subtle bugs on
different platforms.  It's faster in some (most?) cases, too.


Also, this seems generic enough that other programs (git-cvsimport
perhaps) can probably use it, too.  So maybe it could go into Git.pm or
a new module, Git/Tempfile.pm?

-- 
Eric Wong
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux