On Wed, 22 Sep 2010, Avery Pennarun wrote: > With very large objects, just loading them into the delta window wastes a > huge amount of memory. In one repo, I have some objects around 1GB in size, > and git-pack-objects seems to require about 8x that in order to deltify it, > even when the window memory limit is small (eg. --window-memory=100M). With > this patch, the maximum memory usage is about halved. > > Perhaps more importantly, however, disabling deltification for large objects > seems to reduce memory thrashing when you can't fit multiple large objects > into physical RAM at once. It seems to be the difference between "never > finishes" and "finishes eventually" for me. > > Test: > > I created a test repo with 10 sequential commits containing a bunch of > nearly-identical 110MB files (just appending a line each time). > > Without this patch: > > $ /usr/bin/time git repack -a --window-memory=100M > > Counting objects: 43, done. > warning: suboptimal pack - out of memory > Compressing objects: 100% (43/43), done. > Writing objects: 100% (43/43), done. > Total 43 (delta 14), reused 0 (delta 0) > 42.79user 1.07system 0:44.53elapsed 98%CPU (0avgtext+0avgdata > 866736maxresident)k > 0inputs+2752outputs (0major+718471minor)pagefaults 0swaps > > With this patch: > > $ /usr/bin/time -a git repack -a --window-memory=100M > > Counting objects: 43, done. > Compressing objects: 100% (30/30), done. > Writing objects: 100% (43/43), done. > Total 43 (delta 14), reused 0 (delta 0) > 35.86user 0.65system 0:36.30elapsed 100%CPU (0avgtext+0avgdata > 437568maxresident)k > 0inputs+2768outputs (0major+366137minor)pagefaults 0swaps > > It apparently still uses about 4x the memory of the largest object, which is > about twice as good as before, though still kind of awful. (Ideally, we > wouldn't even load the entire large object into memory even once.) To not load big objects into memory, we'd have to add support for the core.bigFileThreshold config option in more places. > builtin/pack-objects.c | 3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index 0e81673..9f1a289 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -1791,6 +1791,9 @@ static void prepare_pack(int window, int depth) > if (entry->size < 50) > continue; > > + if (window_memory_limit && entry->size > window_memory_limit) > + continue; > + I think you should even use entry->size/2 here, or even entry->size/4. The reason for that is 1) you need at least 2 such similar objects in memory to find a possible delta, and 2) reference object to delta against has to be block indexed and that index table is almost the same size as the object itself especially on 64-bit machines. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html