On Fri, 24 Mar 2006, Linus Torvalds wrote:
- the libxdiff algorithm is different, and I bet GNU diff has gotten a
lot more testing. And the thing is, generating a diff is not an exact
science - you can get two different diffs (and you will), and they can
both be perfectly valid. So it's not possible to "validate" the
libxdiff output by just comparing it against GNU diff.
Correct, the diff(A, B) is not unique. If you look inside the test
directory, there's an xregression binary that does:
1) Random generate A
2) Create B by random changing A
3) Create D=A-B
4) Verify that B+D==A and A-D==B (using the library patch function)
It does and repeat this operation continuosly, for both text (using text
diff/patch) and binary (using binary diff/patch). It ran several days
w/out finding errors, so I've a good confidence about it.
- GNU diff does some nice eye-candy, like trying to figure out what the
last function was, and adding that information to the "@@ .." line.
libxdiff doesn't do that.
This, I don't think is a natural part of a generic text/binary diff/patch
library. If you feel it is important, you could post-process the diff, but
IMO is kinda bogus.
- The libxdiff thing has some known deficiencies. In particular, it gets
the "\No newline at end of file" case wrong. So this is currently for
the experimental branch only. I hope Davide will help fix it.
This, need fix. At the moment, in my projects I enforce the final EOL if
missing (look inside the file-load function inside the test directory).
Technical note: this is based on libxdiff-0.17, but I did some surgery to
get rid of the extraneous fat - stuff that git doesn't need, and seriously
cutting down on mmfile_t, which had much more capabilities than the diff
algorithm either needed or used. In this version, "mmfile_t" is just a
trivial <pointer,length> tuple.
That said, I tried to keep the differences to simple removals, so that you
can do a diff between this and the libxdiff origin, and you'll basically
see just things getting deleted. Even the mmfile_t simplifications are
left in a state where the diffs should be readable.
Here you have two options. Either you suck in the libxdiff code and change
it to drop/change the stuff you don't want (the whole libxdiff library
compiled with -O2 is 33KB though). Or you use the library as is, like
you'd use libz & co. Once you have your own load-mmfile, you can pretty
much feed libxdiff as is. Not my choice though, so pick the one you think
best for your project.
I see you use XDF_NEED_MINIMAL. You might want to do some experiments with
and without, to see how diff size changes, versus time.
Apologies to Davide, whom I'd love to get feedback on this all from (I
wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old
If you look inside the test directory, I use a similar function. The
reason of the mmfile born for a use I made of the library inside an
embedded device where there was no guarantee of contiguos memory, and dat
could have been generated in chunks. OTOH an mmfile with a single block is
a perfectly valid mmfile ;)
PS: Another solution you have is to libify GNU diff by creating a
diff_main() & co., usual libification wrapping. You'd need to change
the exit() that diff throws with a setjmp/longjmp, and make it call
you own mem alloc/free functions, in order to free up memory diff does
not clear on return. I did it once, not many changes. This solution
will give you all the GNU diff crud, like function names, etc...
- Davide
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html