[PATCH] Documentation/diff-options: explain different diff algorithms

Stefan Beller <sbeller@xxxxxxxxxx> · Mon, 23 Jul 2018 17:36:19 -0700

As a user I wondered what the diff algorithms are about. Offer at least
a basic explanation on the differences of the diff algorithms.

Signed-off-by: Stefan Beller <sbeller@xxxxxxxxxx>
---
 Documentation/diff-options.txt | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index f466600972f..0d765482027 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -94,16 +94,34 @@ diff" algorithm internally.
 	Choose a diff algorithm. The variants are as follows:
 +
 --
-`default`, `myers`;;
-	The basic greedy diff algorithm. Currently, this is the default.
 `minimal`;;
-	Spend extra time to make sure the smallest possible diff is
-	produced.
+	A diff as produced by the basic greedy algorithm described in
+	link:http://www.xmailserver.org/diff2.pdf[An O(ND) Difference Algorithm and its Variations]
+`default`, `myers`;;
+	The same algorithm as `minimal`, extended with a heuristic to
+	reduce extensive searches. Currently, this is the default.
 `patience`;;
-	Use "patience diff" algorithm when generating patches.
+	Use "patience diff" algorithm when generating patches. This
+	matches the longest common subsequence of unique lines on
+	both sides, recursively. It obtained its name by the way the
+	longest subsequence is found, as that is a byproduct of the
+	patience sorting algorithm. If there are no unique lines left
+	it falls back to `myers`. Empirically this algorithm produces
+	a more readable output for code, but it does not garantuee
+	the shortest output.
 `histogram`;;
-	This algorithm extends the patience algorithm to "support
-	low-occurrence common elements".
+	This algorithm re-implements the `patience` algorithm with
+	"support of low-occurrence common elements" and only picks
+	one element of the LCS for the recursion. It is often the
+	fastest, but in cornercases (when there are many longest
+	common subsequences of the same length) it produces bad
+	results as seen in:
+
+		seq 1 100 >one
+		echo 99 > two
+		seq 1 2 98 >>two
+		git diff --no-index --histogram one two
+
 --
 +
 For instance, if you configured diff.algorithm variable to a
-- 
2.18.0.345.g5c9ce644c3-goog