Re: cvs2svn conversion directly to git ready for experimentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/1/07, Jakub Narebski <jnareb@xxxxxxxxx> wrote:
> Michael Haggerty wrote:
>
> > I am the maintainer of cvs2svn[1], which is a program for one-time
> > conversions from CVS to Subversion. cvs2svn is very robust against the
> > many peculiarities of CVS and can convert just about every CVS
> > repository we have ever seen.
> >
> > I've been working on a cvs2svn output pass that writes the converted CVS
> > repository directly into git rather than Subversion. The code runs now
> > with at least one repository from our test suite of nasty CVS repositories.
>
> Have you contacted Jon Smirl about his unpublished work on cvs2git,
> cvs2svn based CVS to Git converter?

My converter was derived from Michael's cvs2svn code. The bulk of my
work was converting cvs2svn to output in a format that git-fastimport
could consume. This was all rather straight forward and there was
nothing really interesting in the code.

What it exposed were fundamental issues about the technical
complexities of trying to reconstruct a change set history from CVS
which didn't record all of the needed info.  I was never able to
construct a satisfactory git representation of the Mozilla CVS
repository.  Michael has had a long time to work on the change set
detection code and he's probably added some new strategies.

My code did include a CVS file parser for extracting all the revisions
from the file in a single pass. Doing that is a major performance
benefit.  I believe I posted the code to the cvs2svn mailing list. It
was about 200 lines of code. Forking off cvs a million times to
extract the revisions takes days to run.

Same goes for forking git a million times.git-fastimport uses a pipe
to cvs2svn to avoid forking. git-fastimport also uses a technique from
the database world for bulk import, it imports everything without
indexing it. Indexing is done after the import finishes.

Between parsing the CVS files internally and Shawn's git-fastimport,
it was possible to import Mozilla CVS (2.4G) in about 2 hours and
generate a 450MB pack file. You need 3GB of RAM to do this - if swap
happens the process will take weeks to finish.

> Quote from InterfacesFrontendsAndTools page on GIT wiki[1]:
>
>   cvs2git is the unofficial name of Jon Smirl's modifications to cvs2svn.
>   These modifications allow cvs2svn to generate a data stream which is
>   consumed by Shawn Pearce's git-fast-import (now included in git.git).
>   git-fast-import converts its input stream directly into a Git .pack file,
>   minimizing the amount of IO required on large imports.
>
>   Jon Smirl stopped working on cvs2git[2] because first, Mozilla (which was
>   main target of his work) decided that to not to move to git, and second
>   because of troubles with cvs2svn architecture[*] (which it is based on).
>   Jon Smirl has posted his impressions on working on CVS importer in
>   "Some tips for doing a CVS importer" thread[3].
>
> References:
> -----------
> [1] http://git.or.cz/gitwiki/InterfacesFrontendsAndTools#head-23858c2cde0cef60443d8e73e6829a95f8e191ef
> [2] http://msgid.gmane.org/9e4733910611190940y147992b8mbdfac5a51f42e0fe@xxxxxxxxxxxxxx
> [3] http://marc.theaimsgroup.com/?t=116405956000001&r=1&w=2
>
> Footnotes:
> ----------
> [*] If I remember correctly authors of cvs2svn were talking about separating
> the code dealing with disentangling CVS repository structure from the part
> translating it into Subversion repository (with its quirks), and the part
> generating Subversion repository.
>
> --
> Jakub Narebski
> Warsaw, Poland
> ShadeHawk on #git
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Jon Smirl
jonsmirl@xxxxxxxxx
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux