On 12/17/2013 07:47 PM, Eric S. Raymond wrote: > Johan Herland <johan@xxxxxxxxxxx>: >> However, I fear that you underestimate the number of users that want >> to use Git against CVS repos that are orders of magnitude larger (in >> both dimensions: #commits and #files) than your example repo. > > You may be right. See below... > > I'm working with Alan Barret now on trying to convert the NetBSD > repositories. They break cvs-fast-export through sheer bulk of > metadata, by running the machine out of core. This is exactly > the kind of huge case that you're talking about. > > Alan and I are going to take a good hard whack at modifying cvs-fast-export > to make this work. Because there really aren't any feasible alternatives. > The analysis code in cvsps was never good enough. cvs2git, being written > in Python, would hit the core limit faster than anything written in C. cvs2git goes to great lengths to store intermediate data to disk and keep the working set small and therefore (despite the Python overhead) I am confident that it scales better than cvs-fast-export. My usual test repo was gcc: Total CVS Files: 25013 Total CVS Revisions: 578010 Total CVS Branches: 1487929 Total CVS Tags: 11435500 Total Unique Tags: 814 Total Unique Branches: 116 CVS Repos Size in KB: 2074248 Total SVN Commits: 64501 I also regularly converted mozilla (4.2 GB) and emacs (560 MB) for testing purposes. These could all be converted on a 32-bit computer. Other projects that cvs2svn/cvs2git could handle: FreeBSD, Gentoo, KDE, GNOME, PostgreSQL. (Though for KDE, which I think was in the 16 GB range, I know that they used a giant machine for the conversion.) If you haven't tried cvs2git yet, please start it up somewhere in the background. It might take a while but it should have no trouble with your repos, and then you can compare the tools based on experience rather than speculation. > Which matters, because right now the set of people working on CVS lifters > begins with me and ends with Michael Rafferty (cvs2git), who seems even > less interested in incremental conversion than I am. Unless somebody > comes out of nowhere and wants to own that problem, it's not going > to get solved. A correct incremental converter could be done (as long as the CVS users don't literally change history retroactively) but it would be a lot of work. Parsing the CVS files isn't the problem; after all, CVS has to do that every time you check out a branch. The problem is the extra bookkeeping that would be needed to keep the overlapping history consistent between runs N and N+1 of the tool. I sketched out what would be necessary once and it came out to several solid weeks of work. But the traffic on the cvs2svn/cvs2git mailing list has trailed off essentially to zero, so either the software is perfect already (haha) or most everybody has already converted. Therefore I don't invest any significant time in that project these days. Michael -- Michael Haggerty mhagger@xxxxxxxxxxxx http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html