Re: [PATCH] pack-objects: never deltify objects bigger than window_memory_limit.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 22 Sep 2010, Avery Pennarun wrote:

> With very large objects, just loading them into the delta window wastes a
> huge amount of memory.  In one repo, I have some objects around 1GB in size,
> and git-pack-objects seems to require about 8x that in order to deltify it,
> even when the window memory limit is small (eg. --window-memory=100M).  With
> this patch, the maximum memory usage is about halved.
> 
> Perhaps more importantly, however, disabling deltification for large objects
> seems to reduce memory thrashing when you can't fit multiple large objects
> into physical RAM at once.  It seems to be the difference between "never
> finishes" and "finishes eventually" for me.
> 
> Test:
> 
> I created a test repo with 10 sequential commits containing a bunch of
> nearly-identical 110MB files (just appending a line each time).
> 
> Without this patch:
> 
>     $ /usr/bin/time git repack -a --window-memory=100M
> 
>     Counting objects: 43, done.
>     warning: suboptimal pack - out of memory
>     Compressing objects: 100% (43/43), done.
>     Writing objects: 100% (43/43), done.
>     Total 43 (delta 14), reused 0 (delta 0)
>     42.79user 1.07system 0:44.53elapsed 98%CPU (0avgtext+0avgdata
>       866736maxresident)k
>       0inputs+2752outputs (0major+718471minor)pagefaults 0swaps
> 
> With this patch:
> 
>     $ /usr/bin/time -a git repack -a --window-memory=100M
> 
>     Counting objects: 43, done.
>     Compressing objects: 100% (30/30), done.
>     Writing objects: 100% (43/43), done.
>     Total 43 (delta 14), reused 0 (delta 0)
>     35.86user 0.65system 0:36.30elapsed 100%CPU (0avgtext+0avgdata
>       437568maxresident)k
>       0inputs+2768outputs (0major+366137minor)pagefaults 0swaps
> 
> It apparently still uses about 4x the memory of the largest object, which is
> about twice as good as before, though still kind of awful.  (Ideally, we
> wouldn't even load the entire large object into memory even once.)

To not load big objects into memory, we'd have to add support for the 
core.bigFileThreshold config option in more places.

>  builtin/pack-objects.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 0e81673..9f1a289 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1791,6 +1791,9 @@ static void prepare_pack(int window, int depth)
>  		if (entry->size < 50)
>  			continue;
>  
> +		if (window_memory_limit && entry->size > window_memory_limit)
> +                	continue;
> +

I think you should even use entry->size/2 here, or even entry->size/4.  
The reason for that is 1) you need at least 2 such similar objects in 
memory to find a possible delta, and 2) reference object to delta 
against has to be block indexed and that index table is almost the same 
size as the object itself especially on 64-bit machines.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]