Re: [PATCH 8/9] Git.pm: Add hash_and_insert_object and cat_blob

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Adam Roben <aroben@xxxxxxxxx> wrote:
> These functions are more efficient ways of executing `git hash-object -w` and
> `git cat-file blob` when you are dealing with many files/objects.
> 
> Signed-off-by: Adam Roben <aroben@xxxxxxxxx>
> ---
> Eric Wong wrote:
> > > +package Git::Commands;
> > 
> > Can this be a separate file, or a part of Git.pm?  I'm sure other
> > scripts can eventually use this and I've been meaning to split
> > git-svn.perl into separate files so it's easier to follow.
> 
> I ended up making it part of Git.pm, because I realized that made far more
> sense than splitting it into a separate file.
> 
>  perl/Git.pm |   97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 95 insertions(+), 2 deletions(-)

Hi Adam,

Thanks.

> diff --git a/perl/Git.pm b/perl/Git.pm
> index 46c5d10..f23edef 100644
> --- a/perl/Git.pm
> +++ b/perl/Git.pm
> @@ -39,6 +39,9 @@ $VERSION = '0.01';
>    my $lastrev = $repo->command_oneline( [ 'rev-list', '--all' ],
>                                          STDERR => 0 );
>  
> +  my $sha1 = $repo->hash_and_insert_object('file.txt');
> +  my $contents = $repo->cat_blob($sha1);

I missed this the first time around.  But I'd rather be able to pass a
file handle to cat_blob for writing, instead of returning a potentially
huge string in memory.

> @@ -675,6 +677,93 @@ sub hash_object {
>  }
>  
>  
> +=item hash_and_insert_object ( FILENAME )
> +
> +Compute the SHA1 object id of the given C<FILENAME> and add the object to the
> +object database.
> +
> +The function returns the SHA1 hash.
> +
> +=cut
> +
> +# TODO: Support for passing FILEHANDLE instead of FILENAME

Filenames are fine for this input since they (are/should be) generated
by File::Temp and not from an untrusted repo.

We should, however assert that the caller of this function
isn't using a stupid filename with "\n" in it.

> +sub hash_and_insert_object {
> +	my ($self, $filename) = @_;
> +
> +	$self->_open_hash_and_insert_object_if_needed();
> +	my ($in, $out) = ($self->{hash_object_in}, $self->{hash_object_out});
> +
> +	print $out $filename, "\n";
> +	chomp(my $hash = <$in>);
> +	return $hash;
> +}
> +
> +sub _open_hash_and_insert_object_if_needed {
> +	my ($self) = @_;
> +
> +	return if defined($self->{hash_object_pid});
> +
> +	($self->{hash_object_pid}, $self->{hash_object_in},
> +	 $self->{hash_object_out}, $self->{hash_object_ctx}) =
> +		command_bidi_pipe(qw(hash-object -w --stdin-paths));
> +}
> +
> +sub _close_hash_and_insert_object {
> +	my ($self) = @_;
> +
> +	return unless defined($self->{hash_object_pid});
> +
> +	my @vars = map { 'hash_object' . $_ } qw(pid in out ctx);

It looks like you're missing a '_' in there.

> +	command_close_bidi_pipe($self->{@vars});
> +	delete $self->{@vars};
> +}
> +


> +=item cat_blob ( SHA1 )
> +
> +Returns the contents of the blob identified by C<SHA1>.
> +
> +=cut
> +
> +sub cat_blob {
> +	my ($self, $sha1) = @_;
> +
> +	$self->_open_cat_blob_if_needed();
> +	my ($in, $out) = ($self->{cat_blob_in}, $self->{cat_blob_out});
> +
> +	print $out $sha1, "\n";
> +	chomp(my $size = <$in>);
> +
> +	my $blob;
> +	my $result = read($in, $blob, $size);
> +	defined $result or carp $!;
> +
> +	# Skip past the trailing newline.
> +	read($in, my $newline, 1);
> +
> +	return $blob;
> +}

However, I'd very much like to be able to pass a file handle to this
function.  This should read()/print() to a file handle passed to it in a
loop rather than slurping all of $size at once, since the files we're
receiving can be huge.

I'd also be happier if we checked that we actually read $size bytes in
the loop, and that $newline is actually "\n" to safeguard against bugs
in cat-blob.

> +sub _open_cat_blob_if_needed {
> +	my ($self) = @_;
> +
> +	return if defined($self->{cat_blob_pid});
> +
> +	($self->{cat_blob_pid}, $self->{cat_blob_in},
> +	 $self->{cat_blob_out}, $self->{cat_blob_ctx}) =
> +		command_bidi_pipe(qw(cat-file blob --stdin));
> +}
> +
> +sub _close_cat_blob {
> +	my ($self) = @_;
> +
> +	return unless defined($self->{cat_blob_pid});
> +
> +	my @vars = map { 'cat_blob' . $_ } qw(pid in out ctx);

It looks like you're missing a '_' here, too.

> +	command_close_bidi_pipe($self->{@vars});
> +	delete $self->{@vars};
> +}
>  

One more nit, I'm a bit paranoid, but I personally like to die/croak if
the result of every print()/syswrite() to make sure the pipe we're
writing to didn't die or if there were other error indicators.

Hopefully that's the last of tweaks I'd like to see :)

-- 
Eric Wong
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux