Hi, The svndiff format has proved more difficult to parse than expected. This series documents the current state of things, and though it is not complete, it should be ready for nitpicking by the masses. Patches 1-4 modify the line_buffer API by introducing a struct line_buffer to collect state that was previously held in global variables. Callers can use multiple line_buffers to manage input from multiple files at a time. Patches 5-10 add various utility functions to the line_buffer API (wrapping strbuf_fread(), fgetc(), etc). Putting the helpers there instead of having callers work with the FILE* directly means one could easily - tweak the input stream (to insert "link: " at the beginning for symlinks?); - trace reads, for debugging; or - use read() directly in place of stdio and limit the number of bytes buffered if one wants to. Patch 11 adds a data structure and function to manage a "sliding window" without using mmap() or fseek(). See the svndiff0 spec[1] for how this would be used. Patches 12 and 13 are some basic components for reading an svndiff0 file: reading variable-length integers and the opening magic bytes. Patch 15 makes the svn-fe test usable on systems (like Ram's) without libsvn-perl installed. It also should make the test easier to read for people unfamiliar with lib-git-svn.sh. Patch 16 is the delta parser/applier. This patch does _not_ add it to contrib/svn-fe, even though that would be useful, since the command-line interface is not set in stone yet. If you want to try it out, use the test-svn-fe command: test-svn-fe -d <preimage> <delta> <delta length> The preimage or delta arg can be /dev/stdin for use in a pipeline. Both are only read sequentially; they do not need to be regular files. One of the test cases is enormous. The svn delta lib doesn't use multiple windows except when dealing with relatively big files, but probably the test case should be replaced with a smaller, artificial example. One of the test cases does not pass. I also don't know how to apply the delta by hand --- it seems to have some extra bytes at the end. :( Unfortunately the svndiff0 spec is not as clear about when to stop reading as one might like The code separately maintains nominal and actual lengths for a few buffers, since truncated input is permitted (and even required) in the deltas svn produces, though the svndiff0 spec does not document the semantics of that. For svn-fe changes to take advantage of this code to handle the dumpfilev3 format, see <git://github.com/barrbrain/git.git>[2]. So now the full svnrdump | svn-fe | fast-import pipeline can be experienced. It still chokes on some deltas in the wild. Thoughts, cleanups, test cases, bug reports, improvements welcome. :) Enjoy, Jonathan Nieder (15): vcs-svn: Eliminate global byte_buffer[] array vcs-svn: Replace buffer_read_string()'s memory pool with a strbuf vcs-svn: Collect line_buffer data in a struct vcs-svn: Teach line_buffer to handle multiple input files vcs-svn: Make buffer_skip_bytes() report partial reads vcs-svn: Better support for reading large files vcs-svn: Add binary-safe read() function vcs-svn: Let callers peek ahead to find stream end vcs-svn: Allow input errors to be detected early vcs-svn: Allow character-oriented input vcs-svn: Add code to maintain a sliding view of a file vcs-svn: Learn to parse variable-length integers vcs-svn: Learn to check for SVN\0 magic compat: helper for detecting unsigned overflow vcs-svn: Add svn delta parser Ramkumar Ramachandra (1): t9010 (svn-fe): Eliminate dependency on svn perl bindings Makefile | 5 +- vcs-svn/line_buffer.txt | 8 +- vcs-svn/fast_export.c | 6 +- vcs-svn/fast_export.h | 5 +- vcs-svn/line_buffer.c | 99 +- vcs-svn/line_buffer.h | 29 +- vcs-svn/sliding_window.c | 65 + vcs-svn/sliding_window.h | 14 + vcs-svn/svndiff.c | 344 + vcs-svn/svndiff.h | 9 + vcs-svn/svndump.c | 29 +- vcs-svn/LICENSE | 2 + git-compat-util.h | 6 + test-line-buffer.c | 17 +- test-svn-fe.c | 37 +- t/t9010-svn-fe.sh | 29 +- t/t9010/Xerces.cpp.diff0 | Bin 0 -> 12185 bytes t/t9010/Xerces.cpp.done |54963 +++++++++++++++++++++++++++++++++++++++++++++ t/t9010/Xerces.cpp.src |55052 ++++++++++++++++++++++++++++++++++++++++++++++ t/t9010/newdata.diff0 | Bin 0 -> 19392 bytes t/t9010/newdata.done | 522 + t/t9010/src.diff0 | Bin 0 -> 74 bytes t/t9010/src.done | 522 + 23 files changed, 111677 insertions(+), 86 deletions(-) create mode 100644 vcs-svn/sliding_window.c create mode 100644 vcs-svn/sliding_window.h create mode 100644 vcs-svn/svndiff.c create mode 100644 vcs-svn/svndiff.h create mode 100644 t/t9010/Xerces.cpp.diff0 create mode 100644 t/t9010/Xerces.cpp.done create mode 100644 t/t9010/Xerces.cpp.src create mode 100644 t/t9010/blank.done create mode 100644 t/t9010/newdata.diff0 create mode 100644 t/t9010/newdata.done create mode 100644 t/t9010/src.diff0 create mode 100644 t/t9010/src.done [1] http://svn.apache.org/repos/asf/subversion/trunk/notes/svndiff [2] And some design notes: http://thread.gmane.org/gmane.comp.version-control.git/150005/focus=157119 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html