On Mon, Jan 31, 2011 at 09:05, Junio C Hamano <gitster@xxxxxxxxx> wrote: > > * Shawn Pearce says that the diff implementation JGit uses (histogram > diff) performs way better than the xdiff implementation we use by > default. It would be great if somebody can spend time taking a look at > it and possibly port it back to C-git. The idea that JGit's HistogramDiff is faster came from Robin Rosenburg in [1]: > $ time jgit log -M -p >jgit.txt > > real 0m3.880s > user 0m7.137s > sys 0m0.807s > > $ time git log -M -p >cgit.txt > > real 0m16.420s > user 0m14.936s > sys 0m0.899s > > Seems JGit beats the pants off C Git. A quick scans implies both version > produce correct diffs, but don't post this to the Git ML list just yet :) HistogramDiff is a variation of Patience Difference[2]. Within JGit it is implemented as two classes: HistogramDiff[3] - the top level driver HistogramDiffIndex[4] - the hash table used to count and match occurrences Because its based on a hash table, the hash function for lines matters a lot. xdiff's hash is crap, it generates too many collisions on the linux-2.6 repository's source files. We use djb's hash function, which you can find in our RawTextComparator class[5]. We actually tested a number of hash functions using a test program [6]. I may try to port this myself, but it would be great if someone else beat me to it. :-) [1] http://dev.eclipse.org/mhonarc/lists/jgit-dev/msg00897.html [2] http://bramcohen.livejournal.com/73318.html [3] http://egit.eclipse.org/w/?p=jgit.git;a=blob;f=org.eclipse.jgit/src/org/eclipse/jgit/diff/HistogramDiff.java;hb=HEAD [4] http://egit.eclipse.org/w/?p=jgit.git;a=blob;f=org.eclipse.jgit/src/org/eclipse/jgit/diff/HistogramDiffIndex.java;hb=HEAD [5] http://egit.eclipse.org/w/?p=jgit.git;a=blob;f=org.eclipse.jgit/src/org/eclipse/jgit/diff/RawTextComparator.java;hb=HEAD [6] http://egit.eclipse.org/w/?p=jgit.git;a=blob;f=org.eclipse.jgit.pgm/src/org/eclipse/jgit/pgm/debug/TextHashFunctions.java;hb=HEAD -- Shawn. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html