Re: [GSoC update extra!] git-remote-svn: Week 8

Ramkumar Ramachandra <artagnon@xxxxxxxxx> · Wed, 30 Jun 2010 14:45:53 +0200

Hi Sam,

Sam Vilain writes:
> On Thu, 2010-06-24 at 13:07 -0500, Jonathan Nieder wrote:
> > operation.  In other words, it needs the tree for
> > http://path/to/some/svn/root/branches@r11.  This does not correspond
> > to a single git tree, since the content of each branch has been given
> > its own commit.
> 
> I wrote at length about this near the beginning of the project;
> essentially, figuring out whether particular paths are roots or not is
> not defined, as SVN does not distinguish between them (a misfeature
> cargo culted from Perforce).  It becomes a data mining problem, you have
> this scattered data, and you have to find a history inside.

Right. Implementing git-svn on top of git-remote-svn might not be a
bad idea.

> As I recommended before, it probably makes more sense to keep a "remote
> tracking" branch which mirrors the *entire* repository, and sort out
> efficient ways to convert SVN revision paths like the above into tree
> IDs.
> 
> I consider it very important to separate the data import and tracking
> stage from the data mining stage.

We're following this approach. At the moment, we're just focusing on
getting all the data directly from SVN into the Git store. Instead of
building trees for each SVN revision, we've found a way to do it
inside the Git object store: we're currently ironing out the details,
and I'll post an update about this shortly.

> Once the data mining stage is well solved, then it makes sense to look
> at ways that a tracking branch which only tracks a part of the
> Subversion repository can be achieved.  In the simple case, where no
> repository re-organisation or cross-project renames have occurred it is
> relatively simple.  But in general I think this is a harder problem,
> which cannot always be solved without intervention - and so not
> necessary to be solved in short-term milestones.  As you are
> discovering, it is a can of worms which you avoid if you know you always
> have the complete SVN repository available.

Right. I'm not convinced that it necessarily requires user
intervention though: can you systematically prove that enough
information is not available without user intervention using an
example? Or is it possible, but simply too difficult (and not worth
the effort) to mine out the data?

-- Ram
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html