Re: [RFC/PATCH 0/3] teach --histogram to diff

Tay Ray Chuan <rctay89@xxxxxxxxx> · Thu, 14 Jul 2011 00:34:14 +0800

On Tue, Jul 12, 2011 at 10:19 PM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
> On Mon, Jul 11, 2011 at 23:10, Tay Ray Chuan <rctay89@xxxxxxxxx> wrote:
>> [RFC/PATCH 3/3] xdiff/xprepare: use a smaller sample size for histogram
>
> Do we need sampling at all for histogram? Can you skip it?

Sampling is done to get a guess of lines in the file. This guess is
then used to preallocated memory for the list of records. (This is
just a guess; if we find more records we allocate more memory.) By
doing this preallocation, we can save on malloc()'s, giving a
performance boost.

But then sampling has its costs - previously, we ran up to 256
memchr('\n')s within a mmfile "block". For histogram diff, we cut the
cap down to 20. (But not for the other diff algorithms - see the
relevant patch text for more.) I think this gives us a good balance -
time spent in guessing lines, and time gained from preallocating
memory.

-- 
Cheers,
Ray Chuan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html