Re: [PATCH] Fix in Git.pm cat_blob crashes on large files

Jeff King <peff@xxxxxxxx> · Thu, 21 Feb 2013 18:24:48 -0500

On Thu, Feb 21, 2013 at 03:18:40PM -0800, Joshua Clayton wrote:

> > By having the read and flush size be the same, it's much simpler.
> 
> My original bugfix did just read 1024, and write 1024. That works fine
> and, yes, is simpler.
> I changed it to be more similar to the original code in case there
> were performance reasons for doing it that way.
> That was the only reason I could think of for the design, and adding
> the $flushSize variable means that
> some motivated person could easily optimize it.
> 
> So far I have been too lazy to profile the two versions....
> I guess I'll try a trivial git svn init; git svn fetch and check back in.
> Its running now.

I doubt it will make much of a difference; we are already writing to a
perl filehandle, so it will be buffered there (which I assume is 4K, but
I haven't checked). And your version retains the 1024-byte read. I do
think 1024 is quite low for this sort of descriptor-to-descriptor
copying. I'd be tempted to just bump that 1024 to 64K.

> In git svn fetch (which is how I discovered it) the file being passed
> to cat_blob is a temporary file, which is checksummed before putting
> it into place.

Great. There may be other callers outside of our tree, of course, but I
think it's pretty clear that the responsibility is on the caller to make
sure the function succeeded. We are changing what gets put on the output
stream for various error conditions, but ultimately that is an
implementation detail that the caller should not be depending on.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html