On Tue, 1 May 2012, Jeff King wrote: > On Sun, Apr 29, 2012 at 09:53:31AM -0400, Nicolas Pitre wrote: > > > But my remark was related to the fact that you need to double the > > affected resources to gain marginal improvements at some point. This is > > true about computing hardware too: eventually you need way more gates > > and spend much more $$$ to gain some performance, and the added > > performance is never linear with the spending. > > Right, I agree with that. The trick is just finding the right spot on > that curve for each repo to maximize the reward/effort ratio. Absolutely, at least for the default settings. However this is not what --aggressive is meant to be. > > > 1. Should we bump our default window size? The numbers above show that > > > typical repos would benefit from jumping to 20 or even 40. > > > > I think this might be a good indication that the number of objects is a > > bad metric to size the window, as I mentioned previously. > > > > Given that you have the test repos already, could you re-run it with > > --window=1000 and play with --window-memory instead? I would be curious > > to see if this provides more predictable results. > > It doesn't help. The git.git repo does well with about a 1m window > limit. linux-2.6 is somewhere between 1m and 2m. But the phpmyadmin repo > wants more like 16m. So it runs into the same issue as using object > counts. > > But it's much, much worse than that. Here are the actual numbers (same > format as before; left-hand column is either window size (if no unit) or > window-memory limit (if k/m unit), followed by resulting pack size, its > percentage of baseline --window=10 pack, the user CPU time and finally > its percentage of the baseline): > [...] Ouch! Well... so much for good theory. I'm still really surprised and disappointed as I didn't expect such damage at all. However, this is possibly a good baseline to determine a default value for window-memory though. Given your number, we clearly see that good packing can be achieved with relatively little memory and therefore it might be a good idea not to leave this parameter unbounded by default in order to catch potential pathological cases. Maybe 64M would be a good default value? Having a repack process eating up more than 16GB of RAM because its RAM usage is unbounded is certainly not nice. > > Maybe we could look at the size reduction within the delta search loop. > > If the reduction quickly diminishes as tested objects are further away > > from the target one then the window doesn't have to be very large, > > whereas if the reduction remains more or less constant then it might be > > worth searching further. That could be used to dynamically size the > > window at run time. > > I really like the idea of dynamically sizing the window based on what we > find. If it works. I don't think there's any reason you couldn't have 50 > absolutely terrible delta candidates followed by one really amazing > delta candidate. But maybe in practice the window tends to get > progressively worse due to the heuristics, and outliers are unlikely. I > guess we'd have to experiment. Yes. The idea is to continue searching if results are not progressively becoming worse fast enough. Coming up with a good way to infer that is far from obvious though. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html