Re: libxdiff and patience diff

Pierre Habouzit <madcoder@xxxxxxxxxx> · Tue, 04 Nov 2008 17:15:48 +0100

On Tue, Nov 04, 2008 at 03:57:44PM +0000, Johannes Schindelin wrote:
> Hi,
> 
> On Tue, 4 Nov 2008, Pierre Habouzit wrote:
> 
> > The nasty thing about the patience diff is that it still needs the usual
> > diff algorithm once it has split the file into chunks separated by
> > "unique lines".
> 
> Actually, it should try to apply patience diff again in those chunks, 
> separately.

yes it's what I do, but this has a fixed point as soon as you don't find
unique lines between the new found ones, or that that space is "empty".
E.g. you could have the two following hunks:

File A       File B
1            2
2            1
1            2
2            1
1            2
2
1

The simple leading/trailing reduction will do nothing, and you don't
have any shared unique lines, on that you must apply the usual diff
algorithm.

> > So you cannot make really independant stuff. What I could do is put most 
> > of the xpatience diff into xpatience.c but it would still have to use 
> > some functions from xdiffi.c that are currently private, so it messes 
> > somehow the files more than it's worth IMHO.
> 
> I think it is better that you use the stuff from xdiffi.c through a well 
> defined interface, i.e. _not_ mess up the code by mingling it together 
> with the code in xdiffi.c.  The code is hard enough to read already.

Hmmm. I'll see to that later, once I have something that works.

> Oh, BTW, "ha" is a hash of the lines which is used to make the line 
> matching more performant.  You will see a lot of "ha" comparisons before 
> actually calling xdl_recmatch() for that reason.  Incidentally, this is 
> also the hash that I'd use for the hash multi-set I was referring to.

Yeah, that's what I assumed it would be.

> Oh, and I am not sure that it is worth your time trying to get it to run 
> with the linear list, since you cannot reuse that code afterwards, and 
> have to spend the same amount of time to redo it with the hash set.

Having the linear list (actually an array) work would show me I hook at
the proper place. Replacing a data structure doesn't makes me afraid
because I've split the functions properly.

> I am awfully short on time, so it will take some days until I can review 
> what you have already, unfortunately.

NP, it was just in case, because I'm horribly stuck with that code right
now ;)

-- 
·O·  Pierre Habouzit
··O                                                madcoder@xxxxxxxxxx
OOO                                                http://www.madism.org
Attachment:
pgpi2WdMxWGFf.pgp

Description: PGP signature