Re: large files and low memory

Nicolas Pitre <nico@xxxxxxxxxxx> · Tue, 05 Oct 2010 16:17:47 -0400 (EDT)

On Mon, 4 Oct 2010, Jonathan Nieder wrote:

> Shawn Pearce wrote:
> 
> > This change only removes the deflate copy.  But due to the SHA-1
> > consistency issue I alluded to earlier, I think we're still making a
> > full copy of the file in memory before we SHA-1 it or deflate it.
> 
> Hmm, I _think_ we still use mmap for that (which is why 748af44c needs
> to compare the sha1 before and after).
> 
> But
> 
>  1) a one-pass calculation would presumably be a little (5%?) faster

You can't do a one-pass  calculation.  The first one is required to 
compute the SHA1 of the file being added, and if that corresponds to an 
object that we already have then the operation stops right there as 
there is actually nothing to do.  The second pass is to deflate the 
data, and recompute the SHA1 to make sure what we deflated and written 
out is still the same data.

In the case of big files, what we need to do is to stream the file data 
in, compute the SHA1 and deflate it, in order to stream it out into a 
temporary file, then rename it according to the final SHA1.  This would 
allow Git to work with big files, but of course it won't be possible to 
know if the object corresponding to the file is already known until all 
the work has been done, possibly just to throw it away.  But normally 
big files are the minority.

Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html