Hi, I've been working on combine diff recently and have seen that the current implementation does not produce optimal results for coalescing lost lines. I would like to discuss some improvements. I think we can define an optimal solution as the shortest diff output. That is that we coalesced as much line as we could. It means that if each parent has the same file removed, the output will be n lines long (all lines coalesce to each others). The current implementation doesn't look for optimal solution but for the easiest/greedy solution. It also stops matching with a parent if it can't find a match. As an example, it means that losing [1, 2, 3, 4] from parent A, and [3, 1, 2, 4] from parent B would give: - 1 - 2 --3 - 4 -1 -2 -4 While if we invert the two parents order we will have: - 3 --1 --2 - 4 -3 -4 While clearly, in both cases, the optimal solution would be close to: - 3 --1 --2 -3 --4 Let's say we have only one empty file (deleted), with p parents. In each parent, the file was n lines long, but the lines don't have to be the same. It clearly looks like an extension/variation of the Longest Common Subsequence (LCS) problem, but with p sequences (and the common subsequences doesn't have to be common to all sequences). LCS dynamic programming is O(n²) in time and space complexity for two sequences. We could extend it this way: After processing two of the p parents, let's find LCS and build a new optimal list of removals. Then find LCS again between this new list and the third parent removals. And repeat until we made each parents. Best-case analysis: All p parents have the same n lines. We will find LCS and provide a n lines (the same lines) new list in O(n²), and then run it again in O(n²) with the next parent, etc. It will end-up being O(pn²). Worst-case analysis: All p parents have no lines in common. We will find LCS and provide a 2n new list in O(n²). Then we run it again in O(2n x n), and again O(3n x n), etc, until O(pn x n). When we sum these all, we end-up with O(p² x n²) Does it make any sense ? And is it worth implementing ? Cheers, Antoine -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html