Re: large files and low memory

Jonathan Nieder <jrnieder@xxxxxxxxx> · Tue, 5 Oct 2010 15:34:50 -0500

Nicolas Pitre wrote:

> You can't do a one-pass  calculation.  The first one is required to 
> compute the SHA1 of the file being added, and if that corresponds to an 
> object that we already have then the operation stops right there as 
> there is actually nothing to do.

Ah.  Thanks for a reminder.

> In the case of big files, what we need to do is to stream the file data 
> in, compute the SHA1 and deflate it, in order to stream it out into a 
> temporary file, then rename it according to the final SHA1.  This would 
> allow Git to work with big files, but of course it won't be possible to 
> know if the object corresponding to the file is already known until all 
> the work has been done, possibly just to throw it away.

To make sure I understand correctly: are you suggesting that for big
files we should skip the first pass?

I suppose that makes sense: for small files, using a patch application
tool to reach a postimage that matches an existing object is something
git historically needed to expect, but for typical big files:

 - once you've computed the SHA1, you've already invested a noticeable
   amount of time.
 - emailing patches around is difficult, making "git am" etc less important
 - hopefully git or zlib can notice when files are uncompressible,
   making the deflate not cost so much in that case.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html