Re: Use a *real* built-in diff generator

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 24 Mar 2006, Linus Torvalds wrote:

- the libxdiff algorithm is different, and I bet GNU diff has gotten a
  lot more testing. And the thing is, generating a diff is not an exact
  science - you can get two different diffs (and you will), and they can
  both be perfectly valid. So it's not possible to "validate" the
  libxdiff output by just comparing it against GNU diff.

Correct, the diff(A, B) is not unique. If you look inside the test directory, there's an xregression binary that does:

1) Random generate A
2) Create B by random changing A
3) Create D=A-B
4) Verify that B+D==A and A-D==B (using the library patch function)

It does and repeat this operation continuosly, for both text (using text diff/patch) and binary (using binary diff/patch). It ran several days w/out finding errors, so I've a good confidence about it.



- GNU diff does some nice eye-candy, like trying to figure out what the
  last function was, and adding that information to the "@@ .." line.
  libxdiff doesn't do that.

This, I don't think is a natural part of a generic text/binary diff/patch library. If you feel it is important, you could post-process the diff, but IMO is kinda bogus.



- The libxdiff thing has some known deficiencies. In particular, it gets
  the "\No newline at end of file" case wrong. So this is currently for
  the experimental branch only. I hope Davide will help fix it.

This, need fix. At the moment, in my projects I enforce the final EOL if missing (look inside the file-load function inside the test directory).



Technical note: this is based on libxdiff-0.17, but I did some surgery to
get rid of the extraneous fat - stuff that git doesn't need, and seriously
cutting down on mmfile_t, which had much more capabilities than the diff
algorithm either needed or used. In this version, "mmfile_t" is just a
trivial <pointer,length> tuple.

That said, I tried to keep the differences to simple removals, so that you
can do a diff between this and the libxdiff origin, and you'll basically
see just things getting deleted. Even the mmfile_t simplifications are
left in a state where the diffs should be readable.

Here you have two options. Either you suck in the libxdiff code and change it to drop/change the stuff you don't want (the whole libxdiff library compiled with -O2 is 33KB though). Or you use the library as is, like you'd use libz & co. Once you have your own load-mmfile, you can pretty much feed libxdiff as is. Not my choice though, so pick the one you think best for your project. I see you use XDF_NEED_MINIMAL. You might want to do some experiments with and without, to see how diff size changes, versus time.



Apologies to Davide, whom I'd love to get feedback on this all from (I
wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old

If you look inside the test directory, I use a similar function. The reason of the mmfile born for a use I made of the library inside an embedded device where there was no guarantee of contiguos memory, and dat could have been generated in chunks. OTOH an mmfile with a single block is a perfectly valid mmfile ;)



PS: Another solution you have is to libify GNU diff by creating a
    diff_main() & co., usual libification wrapping. You'd need to change
    the exit() that diff throws with a setjmp/longjmp, and make it call
    you own mem alloc/free functions, in order to free up memory diff does
    not clear on return. I did it once, not many changes. This solution
    will give you all the GNU diff crud, like function names, etc...



- Davide


-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]