Re: CVS -> SVN -> Git

"David Frech" <david@xxxxxxxxxxxxxxxxxx> · Sat, 14 Jul 2007 16:23:27 -0700

Now that this party is really rollicking, I think I'll join in. ;-)

I have a modest svn repo (about 800 commits) that contains fifteen or
so small projects. It started life as a CVS repo, and as the projects
grew and changed, and as I learned more about CVS, things got moved
around. Later, when I got interested in svn (in 2005) I converted the
repo, using cvs2svn. It got a few things wrong - mostly, that it
thought there was one project in the repo, and created toplevel
trunk/, branches/, and tags/ directories, and lumped everything below
these.

So, in svn, I moved things around some more.

Now I want to switch to git. I've since added enough to svn that there
is no option but to use th svn repo as my source. git-svnimport
doesn't work for me because its idea of the structure of my repo is
too limited. I looked around, stumbled over fast-import, and got
hooked on the idea of using it. It seemed simple enough... I wrote a
350-line Lua (!!) program that parses the svn dump file and creates a
commit stream for fast-import.

It took a day and half to get the svn dump parsing right (it's an
egregiously bad format) but only a couple of hours to write the
fast-import backend.

The code "works" in the sense that it can read an svn dump and create
a git repo that looks reasonable, but it misses a few things, like
properly inferring branch creation from the "copyfrom" info in the svn
dump.

However, it's fairly fast (~35 commits/sec) and flexible. I want to,
in the process of doing this conversion, "canonicalize" the structure
of the repo and throw away all the commits from cvs and svn that just
moved things around. This poses another inference challenge, but
having a modest simple tool (ie, a short enough program to easily
understand and modify) helps.

Having done all this, I realized that this is a good way to go.
Separating, as Michael suggests, the "parsing" part from the "commit
generating" part, not only makes the tools easier to write, but makes
them more flexible. If hg or bzr had a git-like fast-import (maybe
they do) it would take me about 35 minutes to target that instead. And
in the process I came across some "missing features" in fast-import,
which Shawn Pearce was able to quickly add.

My repo is tiny, but I still think that speed and flexibility are key
in this process. If I can write a little script that can be useful to
someone with 100k commits instead of my measly 800, that's great.

For that matter, fast-import is a fairly short program. It wouldn't be
hard for other scm projects to do something similar. fast-import could
become a "standard" intermediate format. But even if that doesn't
happen, the amounts of code we're talking about (to do parsing and
commit generation) are reasonably modest and easy to change.

As soon as I make a bit more progress I'm going to make my code available.

Cheers,

- David

On 7/14/07, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote:
My idea is not to built (for example) cvs2git; rather, I'd like cvs2svn
to be split conceptually into two tools:

cvs2<abstract_description_of_cvs_history>, whose job it is to determine
the most likely "true" CVS history based on the data stored in the CVS
repository, and

<abstract_description_of_cvs_history>2svn

Then later write

<abstract_description_of_cvs_history>2git
<abstract_description_of_cvs_history>2hg

etc.

The first split is partly done in cvs2svn 2.0.  And I naively imagine
that writing the new output back ends won't be all that much work.

Michael

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
If I have not seen farther, it is because I have stood in the
footsteps of giants.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html