Dear diary, on Mon, Apr 24, 2006 at 05:19:01PM CEST, I got a letter where Geert Bosch <bosch@xxxxxxxxxxx> said that... > > But here comes the sad part. Even after simplifying the code as much as > > I could, performance is still significantly worse than the current > > diff-delta.c code. Repacking again the same Linux kernel repository > > with the current code: > That's unexpected, but I can see how this could be if most files have > very few differences and are relatively small. For such cases, almost > any hash will do, and the more complicated hashing will be more compute > intensive. > > > I have benchmarked my original diff code on a set of large files with > lots of changes. These are hardest to get right, and hardest to get > good performance with. Just try diffing any two large (uncompressed) > tar files, and you'll see. On many of such large files, the new code > is orders of magnitude faster. On these cases, the resulting deltas > are also much smaller. > > The comparison is a bit between a O(n^2) sort that is fast on small > or mostly sorted inputs (but horrible on large ones) and a more > complex O(nlogn) algorithm that is a bit slower for the simple > cases, but far faster for more complex cases. Can't you just switch between different delta algorithms based on some heuristic like the blob size? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Right now I am having amnesia and deja-vu at the same time. I think I have forgotten this before. - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html