On 8/1/07, Jakub Narebski <jnareb@xxxxxxxxx> wrote: > Michael Haggerty wrote: > > > I am the maintainer of cvs2svn[1], which is a program for one-time > > conversions from CVS to Subversion. cvs2svn is very robust against the > > many peculiarities of CVS and can convert just about every CVS > > repository we have ever seen. > > > > I've been working on a cvs2svn output pass that writes the converted CVS > > repository directly into git rather than Subversion. The code runs now > > with at least one repository from our test suite of nasty CVS repositories. > > Have you contacted Jon Smirl about his unpublished work on cvs2git, > cvs2svn based CVS to Git converter? My converter was derived from Michael's cvs2svn code. The bulk of my work was converting cvs2svn to output in a format that git-fastimport could consume. This was all rather straight forward and there was nothing really interesting in the code. What it exposed were fundamental issues about the technical complexities of trying to reconstruct a change set history from CVS which didn't record all of the needed info. I was never able to construct a satisfactory git representation of the Mozilla CVS repository. Michael has had a long time to work on the change set detection code and he's probably added some new strategies. My code did include a CVS file parser for extracting all the revisions from the file in a single pass. Doing that is a major performance benefit. I believe I posted the code to the cvs2svn mailing list. It was about 200 lines of code. Forking off cvs a million times to extract the revisions takes days to run. Same goes for forking git a million times.git-fastimport uses a pipe to cvs2svn to avoid forking. git-fastimport also uses a technique from the database world for bulk import, it imports everything without indexing it. Indexing is done after the import finishes. Between parsing the CVS files internally and Shawn's git-fastimport, it was possible to import Mozilla CVS (2.4G) in about 2 hours and generate a 450MB pack file. You need 3GB of RAM to do this - if swap happens the process will take weeks to finish. > Quote from InterfacesFrontendsAndTools page on GIT wiki[1]: > > cvs2git is the unofficial name of Jon Smirl's modifications to cvs2svn. > These modifications allow cvs2svn to generate a data stream which is > consumed by Shawn Pearce's git-fast-import (now included in git.git). > git-fast-import converts its input stream directly into a Git .pack file, > minimizing the amount of IO required on large imports. > > Jon Smirl stopped working on cvs2git[2] because first, Mozilla (which was > main target of his work) decided that to not to move to git, and second > because of troubles with cvs2svn architecture[*] (which it is based on). > Jon Smirl has posted his impressions on working on CVS importer in > "Some tips for doing a CVS importer" thread[3]. > > References: > ----------- > [1] http://git.or.cz/gitwiki/InterfacesFrontendsAndTools#head-23858c2cde0cef60443d8e73e6829a95f8e191ef > [2] http://msgid.gmane.org/9e4733910611190940y147992b8mbdfac5a51f42e0fe@xxxxxxxxxxxxxx > [3] http://marc.theaimsgroup.com/?t=116405956000001&r=1&w=2 > > Footnotes: > ---------- > [*] If I remember correctly authors of cvs2svn were talking about separating > the code dealing with disentangling CVS repository structure from the part > translating it into Subversion repository (with its quirks), and the part > generating Subversion repository. > > -- > Jakub Narebski > Warsaw, Poland > ShadeHawk on #git > > > - > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Jon Smirl jonsmirl@xxxxxxxxx - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html