Now that this party is really rollicking, I think I'll join in. ;-) I have a modest svn repo (about 800 commits) that contains fifteen or so small projects. It started life as a CVS repo, and as the projects grew and changed, and as I learned more about CVS, things got moved around. Later, when I got interested in svn (in 2005) I converted the repo, using cvs2svn. It got a few things wrong - mostly, that it thought there was one project in the repo, and created toplevel trunk/, branches/, and tags/ directories, and lumped everything below these. So, in svn, I moved things around some more. Now I want to switch to git. I've since added enough to svn that there is no option but to use th svn repo as my source. git-svnimport doesn't work for me because its idea of the structure of my repo is too limited. I looked around, stumbled over fast-import, and got hooked on the idea of using it. It seemed simple enough... I wrote a 350-line Lua (!!) program that parses the svn dump file and creates a commit stream for fast-import. It took a day and half to get the svn dump parsing right (it's an egregiously bad format) but only a couple of hours to write the fast-import backend. The code "works" in the sense that it can read an svn dump and create a git repo that looks reasonable, but it misses a few things, like properly inferring branch creation from the "copyfrom" info in the svn dump. However, it's fairly fast (~35 commits/sec) and flexible. I want to, in the process of doing this conversion, "canonicalize" the structure of the repo and throw away all the commits from cvs and svn that just moved things around. This poses another inference challenge, but having a modest simple tool (ie, a short enough program to easily understand and modify) helps. Having done all this, I realized that this is a good way to go. Separating, as Michael suggests, the "parsing" part from the "commit generating" part, not only makes the tools easier to write, but makes them more flexible. If hg or bzr had a git-like fast-import (maybe they do) it would take me about 35 minutes to target that instead. And in the process I came across some "missing features" in fast-import, which Shawn Pearce was able to quickly add. My repo is tiny, but I still think that speed and flexibility are key in this process. If I can write a little script that can be useful to someone with 100k commits instead of my measly 800, that's great. For that matter, fast-import is a fairly short program. It wouldn't be hard for other scm projects to do something similar. fast-import could become a "standard" intermediate format. But even if that doesn't happen, the amounts of code we're talking about (to do parsing and commit generation) are reasonably modest and easy to change. As soon as I make a bit more progress I'm going to make my code available. Cheers, - David On 7/14/07, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote:
My idea is not to built (for example) cvs2git; rather, I'd like cvs2svn to be split conceptually into two tools: cvs2<abstract_description_of_cvs_history>, whose job it is to determine the most likely "true" CVS history based on the data stored in the CVS repository, and <abstract_description_of_cvs_history>2svn Then later write <abstract_description_of_cvs_history>2git <abstract_description_of_cvs_history>2hg etc. The first split is partly done in cvs2svn 2.0. And I naively imagine that writing the new output back ends won't be all that much work. Michael - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
-- If I have not seen farther, it is because I have stood in the footsteps of giants. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html