Hi fast importers, I would like your thoughts on a few developments in fast-import protocol (thanks to David, Ram, Sverre, Tomas, and Sam for work so far). If they seem good, I'd be happy to help make patches to other backends so these can be implemented widely. Contents: cat-blob command, filemodify (M) with trees, ls command. cat-blob command ---------------- fast-import 1.7.4-rc0 added a new "cat-blob" feature. It is meant to allow exporters that receiving changes in delta form to avoid having to remember the full text of blobs already exported or re-retrieve them from the source repository. It works like this: 1. Out of band, the fast-import frontend and backend negotiate a channel for the backend to send replies to the frontend. In git fast-import, this is a file descriptor, defaulting to stdout. So you can do: mkfifo replies && $frontend <replies | git fast-import --cat-blob-fd=3 3>replies The intent is that stdin would typically be a socket and this file descriptor would point to that. 2. The frontend (optionally) declares use of this feature by putting feature cat-blob at the beginning of the stream. 3. When the frontend needs a previously exported blob to use as delta preimage, it uses the cat-blob command. cat-blob :3 The backend replies with something like 7c8987a987ca98c blob 6 hello More precisely, the output format is <dataref> SP 'blob' SP <length> LF <full text of blob> LF The <dataref> can be any text not including whitespace. The frontend can rely on a little buffering if it wants to print a command after the "cat-blob", but it must read the reply in its entirety if it expects the backend to act on later commands. In other words, the cat-blob command is not guaranteed to be asynchronous. This protocol is used by the svn-fe[1] tool to handle Subversion dump files in version 3 (--deltas) format and seems to work ok. Does this look sane or does it need tweaking or more detailed specification to be widely useful? Even once git 1.7.4 is out, it should be possible to make improvements using a new "feature" name. filemodify (M) with trees ------------------------- fast-import 1.7.3-rc0 introduced the ability for a filemodify (M) command to place a tree named by mark or other <dataref> at a given path, replacing whatever was there before. The implementation had some kinks, which fast-import 1.7.4-rc0 ironed out. Without some way to specify marks or learn tree names out of band, it is not very useful. With some way to learn tree names, it can be used, for example, to rewrite revision metadata while reusing the old tree data: commit refs/heads/master mark :11 committer A U Thor <author@xxxxxxxxxxx> Wed, 26 Jan 2011 15:14:11 -0600 data <<EOF New change description EOF M 040000 4b825dc642cb6eb9a060e54bf8d69288fbee4904 "" There is no "feature" name for this. Corner case: a command to replace a path with the empty tree is interpreted[2] as meaning to remove that file or subtree, because git does not track empty directories. Do the semantics seem reasonable? Should this get a corresponding "feature"? ls command ---------- A patch in flight[3] introduces an "ls" command to read directory entries from the active commit or a named commit. This allows printing a blob from the active commit or copying a blob or tree from a previous commit for use in the current one. It works like so: 1. Frontend writes 'ls' SP <path> LF or 'ls' SP <dataref> SP <path> LF In the first form, the <path> _must_ be surrounded in quotes and quoted C-style. In the second form, the <dataref> can refer to a tag, commit, or tree. 2. Backend replies through the cat-blob channel: <mode> SP <type> SP <dataref> HT <path> LF <mode> is a 6-digit octal mode: 040000, 100644, 100755, 120000, or 160000 for a directory, regular file, executable file, symlink, or submodule, respectively. <type> is 'blob', 'tree', or 'commit'. <dataref> represents the corresponding blob, tree, or commit object. <path> is the path in question. It can be quoted C-style and must be if the path starts with '"' or contains a newline. 3. Frontend reads the reply. The frontend might use that <dataref> in a later filemodify (M) and cat-blob command. Proposed updates to svn-fe[1] use this heavily and work well. One ugly corner case: although it is intended to allow "missing <path>" as a reply when the path is missing, the proposed patch makes git fast-import use an empty tree to signal that case, to ensure that, for example, ls "" M <mode> <dataref> "" is always a non-operation. No "feature" name yet. Even better, it's not part of git yet so I invite to nitpick to your heart's content. Maybe you'd rather the command be called "ls-tree" instead of "ls"? Ask away. :) Thoughts welcome, as always. Jonathan [1] http://repo.or.cz/w/git/jrn.git/blob/refs/heads/vcs-svn-pu:/vcs-svn/svndump.c [2] Or rather, is not interpreted but ought to be, or else fast-import will make it too easy to produce invalid commits. One of the patches in series [3] fixes it. [3] http://thread.gmane.org/gmane.comp.version-control.git/162698/focus=164448 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html