Re: [PATCH 00/11] writing out a huge blob to working tree

Shawn Pearce <spearce@xxxxxxxxxxx> · Sun, 15 May 2011 17:47:58 -0700

On Sun, May 15, 2011 at 17:30, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Traditionally, git always read the full contents of an object in memory
> before performing various operations on it, e.g. comparing for diff,
> writing it to the working tree, etc.  A huge blob that you cannot fit
> in memory was very cumbersome to handle.
,,,
> Interested parties may want to measure the performance impact of the last
> three patches. The series deliberately ignores core.bigfileThreashold and
> let small and large blobs alike go through the streaming_write_entry()
> codepath, but it _might_ turn out that we would want to use the new code
> only for large-ish blobs.

FWIW in JGit we control this by looking at the object size and
comparing to the variable core.streamFileThreshold. For any object
below this size we allocate the buffer, unpack into it, and return the
buffer to the caller. Only objects above the size use the streaming
code paths.

There is a performance difference, at least for us in Java. Most of
the overhead seems to be due to running zlib inflate() with a tiny
buffer size rather than the full destination buffer. This probably has
to do with the cost associated with jumping from the Java bytecode
through JNI to the libz library.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html