Re: diff'ing files ...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 7 Jun 2011, Jeff King wrote:

> On Tue, Jun 07, 2011 at 10:12:35PM +0000, Albretch Mueller wrote:
> 
> > > ... binary diffs, though I don't know offhand the details of the algorithm.
> > ~
> >  this is the part that I need ;-)
> > ~
> >  Reading the source code without knowing the basic underlying
> > ideas/algorithm (just an outline if possible) won't help much
> 
> You could read the comments in the source:
> 
>   $ head -n 7 diff-delta.c
>   /*
>    * diff-delta.c: generate a delta between two buffers
>    *
>    * This code was greatly inspired by parts of LibXDiff from Davide Libenzi
>    * http://www.xmailserver.org/xdiff-lib.html
>    *
>    * Rewritten for GIT by Nicolas Pitre <nico@xxxxxxxxxxx>, (C) 2005-2007
> 
> According to the xdiff page linked:
> 
>   For binary files, LibXDiff implements both (with some modification)
>   the algorithm described in File System Support for Delta Compression
>   by Joshua P.  MacDonald, and the algorithm described in Fingerprinting
>   By Random Polynomials by Michael O. Rabin.
> 
> Nicolas (cc'd) might be able to say what, if any, substantive changes
> were made from those works.

The libxdiff code was pretty generic so to be highly portable and usable 
for many application types.  What I did is to get rid of everything that 
git strictly didn't need in order to make the code as simple as 
possible, and most importantly as fast as possible.  And then the code 
was optimized even further, sacrificing on clarity a bit, to make it 
even faster.  Since this code is the very inner loop of every delta 
search for best delta matches, every bit of optimization counts.

And then further modifications were made to avoid pathological corner 
cases which were taking too much time for little gain in the Git 
context.

I also changed the output encoding to make it tighter.

So, strictly speaking, the current code in Git doesn't bear any 
resemblance with the libxdiff code at all.  However the basic algorithm 
behind both implementations is the same.  Studying the libxdiff version 
is probably easier in order to gain an understanding of how this works.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]