On Tue, Jul 12, 2011 at 10:19 PM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote: > On Mon, Jul 11, 2011 at 23:10, Tay Ray Chuan <rctay89@xxxxxxxxx> wrote: >> [RFC/PATCH 3/3] xdiff/xprepare: use a smaller sample size for histogram > > Do we need sampling at all for histogram? Can you skip it? Sampling is done to get a guess of lines in the file. This guess is then used to preallocated memory for the list of records. (This is just a guess; if we find more records we allocate more memory.) By doing this preallocation, we can save on malloc()'s, giving a performance boost. But then sampling has its costs - previously, we ran up to 256 memchr('\n')s within a mmfile "block". For histogram diff, we cut the cap down to 20. (But not for the other diff algorithms - see the relevant patch text for more.) I think this gives us a good balance - time spent in guessing lines, and time gained from preallocating memory. -- Cheers, Ray Chuan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html