On Fri, Aug 31, 2018 at 2:39 PM Johannes Schindelin <Johannes.Schindelin@xxxxxx> wrote: > > Hi, > > On Thu, 30 Aug 2018, Stefan Beller wrote: > > > On Thu, Aug 30, 2018 at 12:20 PM Jeff King <peff@xxxxxxxx> wrote: > > > > > > [...] Myers does not promise to find the absolute minimal diff. [...] > > > > The `Myers` (our default) diff algorithm is really the Myers algorithm + > > a heuristic that cuts off the long tail when it is very costly to compute > > the minimal diff. > > IIRC Myers (the original, which we spell `minimal`) > promises minimal diffs only for -U0. As soon as you add > context, Myers might in fact end up with a suboptimal diff, even without > that heuristic. ah, the last 4 words made it clear. I have debated the cost function for diffs some time ago and came to the conclusion that having one line added/removed costing a flat price of 1 in the search of the 'lowest cost diff' is pretty good as it does an ok job in the broad world, it is not 'overfitting' assumptions on a problem. For example in some patches I pay more attention to the lines removed than to the lines added, or vice versa, so assigning different costs to added/removed lines would make sense. (It depend on the type of patch, but that cannot be deduced at the low level of diff machinery) Starting a new hunk (i.e. add cost to -U<n> for n >0) could be costly. In fact we have an after-Myers optimization in the xdiff code that checks if hunks can be coalesced together, but if we could have that cost at diff computation time, this might make for better diffs already. Another example is number of changes between added/removed/context parts. If I have to review only one large added part and one large removed part, it may be more understandable than having interleaved adds/removes. (I think --patience goes in that direction, but does optimize for a different metric) Stefan