Hi Davide, I've been working tonight, trying to make libxdiff support the patience diff algorithm, but I've totally failed, because I _thought_ I understood what xdl_split was doing, but it appears I don't. [ For the readers playing at home, the patience diff algorithm is explained after my sig. ] What I did is to: (1) add a flag to the xenv in xdl_split that says that I want a patience "split". (2) Then in xdl_split, if that bit is set, I compute the longest common subsequence from the patience diff. (3) for each split it computes I call xdl_recs_cmp on that interval. What I thought it would achieve is that I force some boundaries at which libxdiff _must_ resync. Though, it seems that for some reason it doesn't work, probably because the "ha" stuff and the boundaries absolutely don't work the way I thought it did. So where is the place I should do that ? I suspect it should be partly in xprepare.c but I'm a bit stuck right now. Any pointer on how the stuff in xprepare.c and xdiffi.c work would help greatly, it's really not self-evident to me :) -- ·O· Pierre Habouzit ··O madcoder@xxxxxxxxxx OOO http://www.madism.org Patience diff ============= Basically, it's an algorithm that considers line from the left file A and the right file B. It searches for lines that are unique *IN* A and unique *IN* B as well. Then using a clever O(n log log n) where n is that number of lines, you extract the longer common sequence of those lines. This defines two splits of file A and B. For each corresponding chunks, you then reduce the lines matching at the beginning and the end, and reiterate the algorithm on the interior of that space, until there are _no_ unique lines at all, and then you apply a usual diff algorithm to generate the diff in that final section. http://alfedenzo.livejournal.com/170301.html has a nice visual explanation of the fact, even if it forgets about the "zones" trimming that helps for the efficiency. It's also often wrong to generate the stacks in the order shown on the blog, because you recurse from the max line to the min line, which is not the natural order to work on a diff. But that's just a matter of "inverting" all the comparisons which is pretty obvious to do. The difference of output can be seen on http://glandium.org/blog/?p=120 where the patience diff picks the line "include $(topsrcdir)/config/rules.mk" as being unique on the left and the right file, hence will use it as sync point rather than using it in a diff.
Attachment:
pgpY0wItMWNVH.pgp
Description: PGP signature