On Thu, Feb 21, 2013 at 2:43 PM, Jeff King <peff@xxxxxxxx> wrote: > On Thu, Feb 21, 2013 at 02:13:32PM -0800, Joshua Clayton wrote: > >> Greetings. >> This is my first patch here. Hopefully I get the stylistic & political >> details right... :) >> Patch applies against maint and master > > I have some comments. :) > > The body of your email should contain the commit message (i.e., whatever > people reading "git log" a year from now would see). Cover letter bits > like this should go after the "---". That way "git am" knows which part > is which. > >> Developer's Certificate of Origin 1.1 > > You don't need to include the DCO. Your "Signed-off-by" is an indication > that you agree to it. > >> Affects git svn clone/fetch >> Original code loaded entire file contents into a variable >> before writing to disk. If the offset within the variable passed >> 2 GiB, it becrame negative, resulting in a crash. > > Interesting. I didn't think perl had signed wrap-around issues like > this, as its numeric variables are not strictly integers. But I don't > have a 32-bit machine to test on (and numbers larger than 2G obviously > work on 64-bit machines). At any rate, though: > >> On a 32 bit system, or a system with low memory it may crash before >> reaching 2 GiB due to memory exhaustion. > > Yeah, it is stupid to read the whole thing into memory if we are just > going to dump it to another filehandle. > >> @@ -949,13 +951,21 @@ sub cat_blob { >> last unless $bytesLeft; >> >> my $bytesToRead = $bytesLeft < 1024 ? $bytesLeft : 1024; >> - my $read = read($in, $blob, $bytesToRead, $bytesRead); >> + my $read = read($in, $blob, $bytesToRead, $blobSize); >> unless (defined($read)) { >> $self->_close_cat_blob(); >> throw Error::Simple("in pipe went bad"); >> } > > Hmph. The existing code already reads in 1024-byte chunks. For no > reason, as far as I can tell, since we are just loading the blob buffer > incrementally into memory, only to then flush it all out at once. > > Why do you read at the $blobSize offset? If we are just reading in > chunks, we be able to just keep writing to the start of our small > buffer, as we flush each chunk out before trying to read more. > > IOW, shouldn't the final code look like this: > > my $bytesLeft = $size; > while ($bytesLeft > 0) { > my $buf; > my $bytesToRead = $bytesLeft < 1024 ? $bytesLeft : 1024; > my $read = read($in, $buf, $bytesToRead); > unless (defined($read)) { > $self->_close_cat_blob(); > throw Error::Simple("unable to read cat-blob pipe"); > } > unless (print $fh $buf) { > $self->_close_cat_blob(); > throw Error::Simple("unable to write blob output"); > } > > $bytesLeft -= $read; > } > > By having the read and flush size be the same, it's much simpler. My original bugfix did just read 1024, and write 1024. That works fine and, yes, is simpler. I changed it to be more similar to the original code in case there were performance reasons for doing it that way. That was the only reason I could think of for the design, and adding the $flushSize variable means that some motivated person could easily optimize it. So far I have been too lazy to profile the two versions.... I guess I'll try a trivial git svn init; git svn fetch and check back in. Its running now. > > Your change (and my proposed code) do mean that an error during the read > operation will result in a truncated output file, rather than an empty > one. I think that is OK, though. That can happen anyway in the original > due to a failure in the "print" step. Any caller who wants to be careful > that they leave only a full file in place must either: > > 1. Check the return value of cat_blob and verify that the result has > $size bytes, and otherwise delete it. > > 2. Write to a temporary file, then once success has been returned from > cat_blob, rename the result into place. > > Neither of which is affected by this change. > > -Peff In git svn fetch (which is how I discovered it) the file being passed to cat_blob is a temporary file, which is checksummed before putting it into place. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html