Re: RFC: New diff-delta.c implementation

Petr Baudis <pasky@xxxxxxx> · Mon, 24 Apr 2006 22:37:34 +0200

Dear diary, on Mon, Apr 24, 2006 at 05:19:01PM CEST, I got a letter
where Geert Bosch <bosch@xxxxxxxxxxx> said that...
> > But here comes the sad part.  Even after simplifying the code as much as 
> > I could, performance is still significantly worse than the current 
> > diff-delta.c code.  Repacking again the same Linux kernel repository 
> > with the current code:
> That's unexpected, but I can see how this could be if most files have
> very few differences and are relatively small. For such cases, almost
> any hash will do, and the more complicated hashing will be more compute
> intensive.
> 
> 
> I have benchmarked my original diff code on a set of large files with
> lots of changes. These are hardest to get right, and hardest to get
> good performance with. Just try diffing any two large (uncompressed)
> tar files, and you'll see. On many of such large files, the new code
> is orders of magnitude faster. On these cases, the resulting deltas
> are also much smaller.
> 
> The comparison is a bit between a O(n^2) sort that is fast on small
> or mostly sorted inputs (but horrible on large ones) and a more
> complex O(nlogn) algorithm that is a bit slower for the simple
> cases, but far faster for more complex cases.

Can't you just switch between different delta algorithms based on some
heuristic like the blob size?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html