Re: GSOC Proposal draft: git-remote-svn

Jonathan Nieder <jrnieder@xxxxxxxxx> · Mon, 2 Apr 2012 19:09:45 -0500

Andrew Sayers wrote:

> Sorry, that wasn't clear.  I meant commands that just expose a single
> primitive bit of functionality (like git-commit-tree) instead of those
> that present an abstract interface to the whole git machinery (like
> git-fast-import).

Ok.  I think you are misunderstanding the purpose of fast-import[1] but
it doesn't take away from what you're saying.

> I agree it's possible to use fast-import for this problem, but it seems
> like it's redundant after svn-fe has already loaded everything into git.

Right, I missed your point here before.  The fundamental question is
not about what commands to use but about the order of operations.

1. In one scheme, first you import the whole tree without splitting it
   into branches, with a tool like svn-fe.  Afterwards, you
   postprocess the resulting repository with tools like "git
   filter-branch --subdirectory-filter".  The result of the import can
   depend on all revisions --- you can say, in rev 1, "I'm not sure
   whether this new directory is a branch; let me see how it develops
   by rev 1000 to decide how to process it".

2. In another scheme, you only import the subset of the repository
   you are interested in.  This is what git-svn does, for example.
   This requires the branch discovery to happen at the same time as
   the import, because otherwise there is no way to tell what subset
   of the repository you are actually interested in.

3. Lastly, in yet another scheme, you import the whole tree and it is
   split into branches on the fly.  The advantages relative to (1) are:

   - impatient people can peek at the partial result of the import as
     it happens

   - the result of importing rev n is guaranteed to depend only on
     revs <= n, so different people importing at different times will
     get the same commits (assuming nobody is rewriting early history
     behind the scenes) and it is obvious how to support incremental
     importants to expand a repository with all revs <= n to a
     repository with all revs <= 2n

   However, if splitting branches only can happen during the initial
   import, that makes it harder to tweak the configuration and try
   again to see what changes.

The relevant technical difference is that in the naive implementation
of scheme (2) you can make use of arbitrary information available over
svn protocol, in naive scheme (3) you can only use information that
makes it into the fast-import stream, and in naive scheme (1) you can
only use information that makes it into the actual git repository.  So
to use scheme (1) you need to make sure svn-fe stores all interesting
data in a visible way, including copyfrom info (which is not a bad
idea anyway).

[...]
> The point I was making in IRC was that (so far as I understand)
> fast-import doesn't let you pass trees around in this way, but instead
> requires you to transmit the contents of all the changed files.

fast-import's "ls" command allows exactly what you are talking about,
and svn-fe uses it to copy subtrees from earlier revs into later ones
when it receives an "svn cp" command.

See [2] for some work that preexists that.

Did I understand correctly?
Jonathan

[1] By acting as a single process that takes a stream of commands it
really is able to do something that no other plumbing command can do.
[2] http://thread.gmane.org/gmane.comp.version-control.git/158375
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html