On Wed, Sep 22, 2010 at 5:00 AM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote: > On Wed, 22 Sep 2010, Avery Pennarun wrote: > To not load big objects into memory, we'd have to add support for the > core.bigFileThreshold config option in more places. Well, it could be automatic in git-pack-objects if it was enabled when the object is larger than window_memory_limit. But I see your point. Anyway, it requires more knowledge of the guts of git (plus patience and motivation) than I currently have. My patch was sufficient to get my previously-un-packable repo packed on my system, which was a big step forward. >> + if (window_memory_limit && entry->size > window_memory_limit) >> + continue; > > I think you should even use entry->size/2 here, or even entry->size/4. > The reason for that is 1) you need at least 2 such similar objects in > memory to find a possible delta, and 2) reference object to delta > against has to be block indexed and that index table is almost the same > size as the object itself especially on 64-bit machines. I would be happy to re-spin my exciting two line patch to include whatever threshold you think it best :) Good point about #1; dividing by two does seem like a good idea. (Though arguably you could construct a small object from a large one just by deleting a lot of stuff, and a small + large object could fit in a window close to the large object's size. I doubt this happens frequently, though.) As for #2, does the block index table count toward the window_memory_limit right now? If it doesn't, then dividing by 4 doesn't seem to make sense. If it does, then it does make sense, I guess. (Though maybe dividing by 3 is a bit more generous just in case.) Anyway, the window_size_limit stuff seems pretty weakly implemented overall; with or without my patch, it seems like you could end up using vastly more memory than the window size. Imagine setting the window_memory_limit to 10MB and packing a 1GB object; it still ends up in memory at least two or three times, for a 200x overshoot. Without my patch, it's about 2x worse than even that, though I don't know why. So my question is: is it worth it to try to treat window_memory_limit as "maximum memory used by the pack operation" or should it literally be the maximum size of the window? If the former, well, we've got a lot of work ahead of us, but dividing by 4 might be appropriate. If the latter, dividing by two is probably the most we should do. FWIW, the test timings described in my commit message are unaffected whether we divide by 1, 2, or 4, because as it happens, I didn't have any objects between 25 and 100 MB in size :) Have fun, Avery -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html