Re: malloc fails when dealing with huge files

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 10 Dec 2008 11:32:32 -0800 (PST)

On Wed, 10 Dec 2008, Jonathan Blanton wrote:
>
> I'm using Git for a project that contains huge (multi-gigabyte) files.
>  I need to track these files, but with some of the really big ones,
> git-add aborts with the message "fatal: Out of memory, malloc failed".

git is _really_ not designed for huge files.

By design - good or bad - git does pretty much all single file operations 
with the whole file in memory as one single allocation. 

Now, some of that is hard to fix - or at least would generate much more 
complex code. The _particular_ case of "git add" could be fixed without 
undue pain, but it's not entirely trivial either.

The main offender is probably "index_fd()" that just mmap's the whole file 
in one go and then calls write_sha1_file() which really expects it to be 
one single memory area both for the initial SHA1 create and for the 
compression and writing out of the result.

Changing that to do big files in pieces would not be _too_ painful, but 
it's not just a couple of lines either.

However, git performance with big files would never be wonderful, and 
things like "git diff" would still end up reading not just the whole file, 
but _both_versions_ at the same time. Marking the big files as being 
no-diff might help, though.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html