Re: [RFC/PATCH 0/3] teach --histogram to diff

Shawn Pearce <spearce@xxxxxxxxxxx> · Tue, 12 Jul 2011 07:19:47 -0700

On Mon, Jul 11, 2011 at 23:10, Tay Ray Chuan <rctay89@xxxxxxxxx> wrote:
> (Shawn, I was held up with the patch messages, sorry for the delay.)
>
> Port JGit's HistogramDiff(Index) over to C. This algorithm extends the
> patience algorithm to "support low-occurrence common elements" [1].
>
> Rough numbers show that it is a faster alternative to its --patience
> cousin, as well as to the default Meyers algorithm:
>
>  $ time ./git log --histogram -p v1.0.0 >/dev/null
>
>  real    0m12.998s
>  user    0m11.506s
>  sys     0m1.487s
>  $ time ./git log -p v1.0.0 >/dev/null
>
>  real    0m13.575s
>  user    0m12.101s
>  sys     0m1.468s
>  $ time ./git log --patience -p v1.0.0 >/dev/null
>
>  real    0m14.978s
>  user    0m13.508s
>  sys     0m1.464s

Nice!

Not the big difference that it is for us in JGit (between histogram
and Myers), but its nice to see an improvement here, even if it is
only 0.5s for the entire 1.0.0 history. How do the diffs come out? One
of the arguments for patience diff is the formatting can sometimes be
more readable for certain changes, but its slower. Histogram tries to
apply a similar algorithm as patience in order to get the formatting
benefits, but also some performance improvements.

Have you looked at a patch that differs in output between Myers and
patience, and then compared those to the histogram version?

> The first patch implements JGit's HistogramDiff(Index) proper. The
> second and third patches aren't essential but yield performance gains.
...
> [RFC/PATCH 1/3] teach --histogram to diff
> [RFC/PATCH 2/3] xdiff/xprepare: skip classification
> [RFC/PATCH 3/3] xdiff/xprepare: use a smaller sample size for histogram

Do we need sampling at all for histogram? Can you skip it?

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html