As a user I wondered what the diff algorithms are about. Offer at least a basic explanation on the differences of the diff algorithms. Signed-off-by: Stefan Beller <sbeller@xxxxxxxxxx> --- Documentation/diff-options.txt | 10 +++++++--- Documentation/git-diff.txt | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+), 3 deletions(-) diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index f394608b42c..eae033a21ea 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -91,14 +91,18 @@ appearing as a deletion or addition in the output. It uses the "patience diff" algorithm internally. --diff-algorithm={patience|minimal|histogram|myers}:: - Choose a diff algorithm. The variants are as follows: + Choose a diff algorithm. See the discussion of DIFF ALGORITHMS +ifndef::git-diff[] + in linkgit:git-diff[1] +endif::git-diff[] + . The variants are as follows: + -- `default`, `myers`;; The basic greedy diff algorithm. Currently, this is the default. `minimal`;; - Spend extra time to make sure the smallest possible diff is - produced. + The same algorithm as `myers`, but spend extra time to make + sure the smallest possible diff is produced. `patience`;; Use "patience diff" algorithm when generating patches. `histogram`;; diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt index b180f1fa5bf..b182389aaae 100644 --- a/Documentation/git-diff.txt +++ b/Documentation/git-diff.txt @@ -119,6 +119,40 @@ include::diff-options.txt[] include::diff-format.txt[] +DIFF ALGORITHMS +--------------- +`Myers` + +A diff as produced by the basic greedy algorithm described in +link:http://www.xmailserver.org/diff2.pdf[An O(ND) Difference Algorithm and its Variations]. +with a run time of O(M + N + D^2). It employs a heuristic to allow for +a faster diff at the small cost of diff size. +The `minimal` algorithm has that heuristic turned off. + +`Patience` + +This algorithm by Bram Cohen matches the longest common subsequence +of unique lines on both sides, recursively. It obtained its name by +the way the longest subsequence is found, as that is a byproduct of +the patience sorting algorithm. If there are no unique lines left +it falls back to `myers`. Empirically this algorithm produces +a more readable output for code, but it does not garantuee +the shortest output. + +`Histogram` + +This algorithm finds the longest common substring and recursively +diffs the content before and after the longest common substring. +If there are no common substrings left, fallback to `myers`. +This is often the fastest, but in corner cases (when there are +many common substrings of the same length) it produces bad +results as seen in: + + seq 1 100 >one + echo 99 > two + seq 1 2 98 >>two + git diff --no-index --histogram one two + EXAMPLES -------- -- 2.18.0.597.ga71716f1ad-goog