Re: gc --aggressive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 1 May 2012, Jeff King wrote:

> On Sun, Apr 29, 2012 at 09:53:31AM -0400, Nicolas Pitre wrote:
> 
> > But my remark was related to the fact that you need to double the 
> > affected resources to gain marginal improvements at some point.  This is 
> > true about computing hardware too: eventually you need way more gates 
> > and spend much more $$$ to gain some performance, and the added 
> > performance is never linear with the spending.
> 
> Right, I agree with that. The trick is just finding the right spot on
> that curve for each repo to maximize the reward/effort ratio.

Absolutely, at least for the default settings.  However this is not what 
--aggressive is meant to be.

> > >   1. Should we bump our default window size? The numbers above show that
> > >      typical repos would benefit from jumping to 20 or even 40.
> > 
> > I think this might be a good indication that the number of objects is a 
> > bad metric to size the window, as I mentioned previously.
> > 
> > Given that you have the test repos already, could you re-run it with 
> > --window=1000 and play with --window-memory instead?  I would be curious 
> > to see if this provides more predictable results.
> 
> It doesn't help. The git.git repo does well with about a 1m window
> limit. linux-2.6 is somewhere between 1m and 2m. But the phpmyadmin repo
> wants more like 16m. So it runs into the same issue as using object
> counts.
> 
> But it's much, much worse than that. Here are the actual numbers (same
> format as before; left-hand column is either window size (if no unit) or
> window-memory limit (if k/m unit), followed by resulting pack size, its
> percentage of baseline --window=10 pack, the user CPU time and finally
> its percentage of the baseline):
> [...]

Ouch!  Well... so much for good theory.  I'm still really surprised and 
disappointed as I didn't expect such damage at all.

However, this is possibly a good baseline to determine a default value 
for window-memory though.  Given your number, we clearly see that good 
packing can be achieved with relatively little memory and therefore it 
might be a good idea not to leave this parameter unbounded by default in 
order to catch potential pathological cases.  Maybe 64M would be a good 
default value?  Having a repack process eating up more than 16GB of RAM 
because its RAM usage is unbounded is certainly not nice.

> > Maybe we could look at the size reduction within the delta search loop.  
> > If the reduction quickly diminishes as tested objects are further away 
> > from the target one then the window doesn't have to be very large, 
> > whereas if the reduction remains more or less constant then it might be 
> > worth searching further.  That could be used to dynamically size the 
> > window at run time.
> 
> I really like the idea of dynamically sizing the window based on what we
> find. If it works. I don't think there's any reason you couldn't have 50
> absolutely terrible delta candidates followed by one really amazing
> delta candidate. But maybe in practice the window tends to get
> progressively worse due to the heuristics, and outliers are unlikely. I
> guess we'd have to experiment.

Yes.  The idea is to continue searching if results are not progressively 
becoming worse fast enough.  Coming up with a good way to infer that is 
far from obvious though.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]