Re: [PATCH] Fix in Git.pm cat_blob crashes on large files

Jeff King <peff@xxxxxxxx> · Thu, 21 Feb 2013 17:43:20 -0500

On Thu, Feb 21, 2013 at 02:13:32PM -0800, Joshua Clayton wrote:

> Greetings.
> This is my first patch here. Hopefully I get the stylistic & political
> details right... :)
> Patch applies against maint and master

I have some comments. :)

The body of your email should contain the commit message (i.e., whatever
people reading "git log" a year from now would see). Cover letter bits
like this should go after the "---". That way "git am" knows which part
is which.

>         Developer's Certificate of Origin 1.1

You don't need to include the DCO. Your "Signed-off-by" is an indication
that you agree to it.

> Affects git svn clone/fetch
> Original code loaded entire file contents into a variable
> before writing to disk. If the offset within the variable passed
> 2 GiB, it becrame negative, resulting in a crash.

Interesting. I didn't think perl had signed wrap-around issues like
this, as its numeric variables are not strictly integers. But I don't
have a 32-bit machine to test on (and numbers larger than 2G obviously
work on 64-bit machines). At any rate, though:

> On a 32 bit system, or a system with low memory it may crash before
> reaching 2 GiB due to memory exhaustion.

Yeah, it is stupid to read the whole thing into memory if we are just
going to dump it to another filehandle.

> @@ -949,13 +951,21 @@ sub cat_blob {
>  		last unless $bytesLeft;
> 
>  		my $bytesToRead = $bytesLeft < 1024 ? $bytesLeft : 1024;
> -		my $read = read($in, $blob, $bytesToRead, $bytesRead);
> +		my $read = read($in, $blob, $bytesToRead, $blobSize);
>  		unless (defined($read)) {
>  			$self->_close_cat_blob();
>  			throw Error::Simple("in pipe went bad");
>  		}

Hmph. The existing code already reads in 1024-byte chunks. For no
reason, as far as I can tell, since we are just loading the blob buffer
incrementally into memory, only to then flush it all out at once.

Why do you read at the $blobSize offset? If we are just reading in
chunks, we be able to just keep writing to the start of our small
buffer, as we flush each chunk out before trying to read more.

IOW, shouldn't the final code look like this:

  my $bytesLeft = $size;
  while ($bytesLeft > 0) {
          my $buf;
          my $bytesToRead = $bytesLeft < 1024 ? $bytesLeft : 1024;
          my $read = read($in, $buf, $bytesToRead);
          unless (defined($read)) {
                  $self->_close_cat_blob();
                  throw Error::Simple("unable to read cat-blob pipe");
          }
          unless (print $fh $buf) {
                  $self->_close_cat_blob();
                  throw Error::Simple("unable to write blob output");
          }

          $bytesLeft -= $read;
  }

By having the read and flush size be the same, it's much simpler.

Your change (and my proposed code) do mean that an error during the read
operation will result in a truncated output file, rather than an empty
one. I think that is OK, though. That can happen anyway in the original
due to a failure in the "print" step. Any caller who wants to be careful
that they leave only a full file in place must either:

  1. Check the return value of cat_blob and verify that the result has
     $size bytes, and otherwise delete it.

  2. Write to a temporary file, then once success has been returned from
     cat_blob, rename the result into place.

Neither of which is affected by this change.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html