Re: I have end-of-lifed cvsps

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Thu, 19 Dec 2013 00:44:29 +0100

On 12/17/2013 07:47 PM, Eric S. Raymond wrote:
> Johan Herland <johan@xxxxxxxxxxx>:
>> However, I fear that you underestimate the number of users that want
>> to use Git against CVS repos that are orders of magnitude larger (in
>> both dimensions: #commits and #files) than your example repo.
> 
> You may be right. See below...
> 
> I'm working with Alan Barret now on trying to convert the NetBSD
> repositories. They break cvs-fast-export through sheer bulk of
> metadata, by running the machine out of core.  This is exactly
> the kind of huge case that you're talking about.
> 
> Alan and I are going to take a good hard whack at modifying cvs-fast-export 
> to make this work. Because there really aren't any feasible alternatives.
> The analysis code in cvsps was never good enough. cvs2git, being written
> in Python, would hit the core limit faster than anything written in C.

cvs2git goes to great lengths to store intermediate data to disk and
keep the working set small and therefore (despite the Python overhead) I
am confident that it scales better than cvs-fast-export.  My usual test
repo was gcc:

Total CVS Files:             25013
Total CVS Revisions:        578010
Total CVS Branches:        1487929
Total CVS Tags:           11435500
Total Unique Tags:             814
Total Unique Branches:         116
CVS Repos Size in KB:      2074248
Total SVN Commits:           64501

I also regularly converted mozilla (4.2 GB) and emacs (560 MB) for
testing purposes.  These could all be converted on a 32-bit computer.

Other projects that cvs2svn/cvs2git could handle: FreeBSD, Gentoo, KDE,
GNOME, PostgreSQL.  (Though for KDE, which I think was in the 16 GB
range, I know that they used a giant machine for the conversion.)

If you haven't tried cvs2git yet, please start it up somewhere in the
background.  It might take a while but it should have no trouble with
your repos, and then you can compare the tools based on experience
rather than speculation.

> Which matters, because right now the set of people working on CVS lifters
> begins with me and ends with Michael Rafferty (cvs2git), who seems even
> less interested in incremental conversion than I am.  Unless somebody
> comes out of nowhere and wants to own that problem, it's not going
> to get solved.

A correct incremental converter could be done (as long as the CVS users
don't literally change history retroactively) but it would be a lot of
work.  Parsing the CVS files isn't the problem; after all, CVS has to do
that every time you check out a branch.  The problem is the extra
bookkeeping that would be needed to keep the overlapping history
consistent between runs N and N+1 of the tool.  I sketched out what
would be necessary once and it came out to several solid weeks of work.

But the traffic on the cvs2svn/cvs2git mailing list has trailed off
essentially to zero, so either the software is perfect already (haha) or
most everybody has already converted.  Therefore I don't invest any
significant time in that project these days.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html