on Tue Jan 27 2009, "Shawn O. Pearce" <spearce-AT-spearce.org> wrote: > David Abrahams <dave@xxxxxxxxxxxx> wrote: >> I've been abusing Git for a purpose it wasn't intended to serve: >> archiving a large number of files with many duplicates and >> near-duplicates. Every once in a while, when trying to do something >> really big, it tells me "malloc failed" and bails out (I think it's >> during "git add" but because of the way I issued the commands I can't >> tell: it could have been a commit or a gc). This is on a 64-bit linux >> machine with 8G of ram and plenty of swap space, so I'm surprised. >> >> Git is doing an amazing job at archiving and compressing all this stuff >> I'm putting in it, but I have to do it a wee bit at a time or it craps >> out. Bug? > > No, not really. Above you said you are "abusing git for a purpose > it wasn't intended to serve"... Absolutely; I want to be upfront about that :-) > Git was never designed to handle many large binary blobs of data. They're largely text blobs, although there definitely are a fair share of binaries. > It was mostly designed for source code, where the majority of the > data stored in it is some form of text file written by a human. > > By their very nature these files need to be relatively short (e.g. > under 1 MB each) as no human can sanely maintain a text file that > large without breaking it apart into different smaller files (like > the source code for an operating system kernel). > > As a result of this approach, the git code assumes it can malloc() > at least two blocks large enough for each file: one of the fully > decompressed content, and another for the fully compressed content. > Try doing git add on a large file and its very likely malloc > will fail due to ulimit issues, or you just don't have enough > memory/address space to go around. Oh, so maybe I'm getting hit by ulimit; I didn't think of that. I could raise my ulimit to try to get around this. > git gc likewise needs a good chunk of memory, but it shouldn't > usually report "malloc failed". Usually in git gc if a malloc fails > it prints a warning and degrades the quality of its data compression. > But there are critical bookkeeping data structures where we must be > able to malloc the memory, and if those fail because we've already > exhausted the heap early on, then yea, it can fail too. Thanks much for that, and for reminding me about ulimit. Cheers, -- Dave Abrahams BoostPro Computing http://www.boostpro.com -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html