Porting JGit HistogramDiff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 31, 2011 at 09:05, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
>  * Shawn Pearce says that the diff implementation JGit uses (histogram
>   diff) performs way better than the xdiff implementation we use by
>   default. It would be great if somebody can spend time taking a look at
>   it and possibly port it back to C-git.

The idea that JGit's HistogramDiff is faster came from Robin Rosenburg in [1]:

> $ time jgit log -M -p >jgit.txt
>
> real	0m3.880s
> user	0m7.137s
> sys	0m0.807s
>
> $ time git log -M -p >cgit.txt
>
> real	0m16.420s
> user	0m14.936s
> sys	0m0.899s
>
> Seems JGit beats the pants off C Git. A quick scans implies both version
> produce correct diffs, but don't post this to the Git ML list just yet :)


HistogramDiff is a variation of Patience Difference[2].  Within JGit
it is implemented as two classes:

  HistogramDiff[3] - the top level driver

  HistogramDiffIndex[4] - the hash table used to count and match occurrences


Because its based on a hash table, the hash function for lines matters
a lot.  xdiff's hash is crap, it generates too many collisions on the
linux-2.6 repository's source files.  We use djb's hash function,
which you can find in our RawTextComparator class[5].  We actually
tested a number of hash functions using a test program [6].

I may try to port this myself, but it would be great if someone else
beat me to it.  :-)


[1]  http://dev.eclipse.org/mhonarc/lists/jgit-dev/msg00897.html
[2]  http://bramcohen.livejournal.com/73318.html
[3]  http://egit.eclipse.org/w/?p=jgit.git;a=blob;f=org.eclipse.jgit/src/org/eclipse/jgit/diff/HistogramDiff.java;hb=HEAD
[4]  http://egit.eclipse.org/w/?p=jgit.git;a=blob;f=org.eclipse.jgit/src/org/eclipse/jgit/diff/HistogramDiffIndex.java;hb=HEAD
[5]  http://egit.eclipse.org/w/?p=jgit.git;a=blob;f=org.eclipse.jgit/src/org/eclipse/jgit/diff/RawTextComparator.java;hb=HEAD
[6]  http://egit.eclipse.org/w/?p=jgit.git;a=blob;f=org.eclipse.jgit.pgm/src/org/eclipse/jgit/pgm/debug/TextHashFunctions.java;hb=HEAD

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]