On Mon, Oct 4, 2010 at 11:24 AM, Joshua Jensen <jjensen@xxxxxxxxxxxxxxxxx> wrote: >> On Mon, Oct 4, 2010 at 2:20 AM, Enrico Weigelt<weigelt@xxxxxxxx> wrote: >>> >>> when adding files which are larger than available physical memory, >>> git performs very slow. >> >> The mmap() isn't the problem. Its the allocation of a buffer that is >> larger than the file in order to hold the result of deflating the file >> before it gets written to disk. ... >> This is a known area in Git where big files aren't handled well. > > As a curiosity, I've always done streaming decompression with zlib using > minimal buffer sizes (64k, perhaps). I'm sure there is good reason why Git > doesn't do this (delta application?). Do you know what it is? Laziness. Git originally assumed it would only be used for smaller source files written by humans. Its easier to write the code as a single malloc'd buffer than to stream it. We'd like to fix it, but its harder than it sounds. Today we copy the file into a buffer before we deflate and compute the SHA-1 as this prevents us from getting into a consistency error when the file is modified between these two stages. -- Shawn. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html