Re: gc --aggressive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[ coming late to this thread -- thanks to peff who pulled my attention ]

On Tue, 17 Apr 2012, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > On Tue, Apr 17, 2012 at 03:17:28PM -0700, Junio C Hamano wrote:
> >
> >> > How many cores are there on this box? Have you tried setting
> >> > pack.windowMemory to (12 / # of cores) or thereabouts?
> >> 
> >> Hrm, from the end-user's point of view, it appears that pack.windowMemory
> >> ought to mean the total without having to worry about the division of it
> >> across threads (which the implementation should be responsible for).
> >
> > Agreed. I had to look in the code to check which it meant. I'm not sure
> > we can change it without regressing existing users, though.
> 
> This is a tangent, but I noticed that the canned settings for "aggressive"
> use an arbitrarily hardcoded value of depth=250 and window=250 (tweakable
> with gc.aggressiveWindow).
> 
> Even though a shallower depth does cause base candidates with too long a
> chain hanging to be evicted prematurely while it is still in window and
> will lead to smaller memory consumption, I do not think the value of
> "depth" affects the pack-time memory consumption too much.  But the
> runtime performance of the resulting pack may not be great (in the worst
> case you would have to undelta 249 times to get to the object data).  We
> may want to loosen it a bit.

I think people are having misconceptions about the definition of the 
word "aggressive".

This option is, well, aggressive.  By definition this is not meant to be 
"nice".  This is not meant to be fast, or light on memory usage, etc.  
This means "achieve as much damage you can" to reduce the pack size.

If people are using it every night then they must be masochists, or 
attracted by violence, or getting a bit too casual with word 
definitions.

So if being --aggressive hurts, then don't do it.

If people want a loosened version, it would be more appropriate to 
introduce a --mild, or --bold, or --disruptive option.  In the same 
vain, an --insane option could even be introduced to go even further 
than --aggressive.

This being said, this is no excuse for regressions though.  If git is 
eating up much more memory than it used to, provided with the same 
repository and repacking parameters than before, then this certainly 
needs fixing.  But making --aggressive less so is not a fix.

> Also it might make sense to make the window size a bit more flexible
> depending on the nature of your history (you would get bigger benefit with
> larger window when your history has fine grained commits; if there are not
> many few-liner commits, larger window may not help you that much).

How do you detect the history nature of a repository?  That's the hard 
part.  Because it should be auto detected as most users won't make a good 
guess for the best parameter value to use.

Anyway, I think that the window size in terms of objects is a bad 
parameter.  Historically that is the first thing we implemented. But the 
window _memory_ usage is probably a better setting to use.  The delta 
search cost is directly proportional to the amount of data to process 
and that can be controlled with --window-memory, with the ability to 
scale up and down the number of objects in the window.  Keeping the 
number of objects constant makes memory usage totally random since this 
depends on the repository content, and the computing cost to process it 
is highly unpredictable. This is very counter-intuitive for users.

Right now the window is limited by default to 10 objects, and window 
memory usage is unlimited.  This could be reworked so object number, 
while still being limited to avoid pathological cases, could be much 
higher, and the window memory usage always limited by default.  That 
default memory usage could be scaled according to the available 
resources on the system.  But if the user wants to play with this, then 
using a memory usage parameter is much easier to understand with more 
directly observable system load influence.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]