On Mon, 4 Oct 2010, Jonathan Nieder wrote: > Shawn Pearce wrote: > > > This change only removes the deflate copy. But due to the SHA-1 > > consistency issue I alluded to earlier, I think we're still making a > > full copy of the file in memory before we SHA-1 it or deflate it. > > Hmm, I _think_ we still use mmap for that (which is why 748af44c needs > to compare the sha1 before and after). > > But > > 1) a one-pass calculation would presumably be a little (5%?) faster You can't do a one-pass calculation. The first one is required to compute the SHA1 of the file being added, and if that corresponds to an object that we already have then the operation stops right there as there is actually nothing to do. The second pass is to deflate the data, and recompute the SHA1 to make sure what we deflated and written out is still the same data. In the case of big files, what we need to do is to stream the file data in, compute the SHA1 and deflate it, in order to stream it out into a temporary file, then rename it according to the final SHA1. This would allow Git to work with big files, but of course it won't be possible to know if the object corresponding to the file is already known until all the work has been done, possibly just to throw it away. But normally big files are the minority. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html