Johan Herland <johan@xxxxxxxxxxx>: > > Alan and I are going to take a good hard whack at modifying cvs-fast-export > > to make this work. Because there really aren't any feasible alternatives. > > The analysis code in cvsps was never good enough. cvs2git, being written > > in Python, would hit the core limit faster than anything written in C. > > Depends on how it organizes its data structures. Have you actually > tried running cvs2git on it? I'm not saying you are wrong, but I had > similar problems with my custom converter (also written in Python), > and solved them by adding multiple passes/phases instead of trying to > do too much work in fewer passes. In the end I ended up storing the > largest inter-phase data structures outside of Python (sqlite in my > case) to save memory. Obviously it cost a lot in runtime, but it meant > that I could actually chew through our largest CVS modules without > running out of memory. You make a good point. cvs2git is descended from cvs2svn, which has such a multipass organization - it will only have to avoid memory limits per pass. Alan and I will try that as a fallback if cvs-fast-import continues to choke. > > It is certainly the case that a sufficiently large CVS repo will break > > anything, like a star with a mass over the Chandrasekhar limit becoming a > > black hole :-) > > :) True, although it's not the sheer size of the files themselves that > is the actual problem. Most of those bytes are (deltified) file data, > which you can pretty much stream through and convert to a > corresponding fast-export stream of blob objects. The code for that > should be fairly straightforward (and should also be eminently > parallelizable, given enough cores and available I/O), resulting in a > table mapping CVS file:revision pairs to corresponding Git blob SHA1s, > and an accompanying (set of) packfile(s) holding said blobs. Allowing for the fact that cvs-fast-export isn't git and doesn't use SHA1s or packfiles, this is in fact how a large portion of cvs-fast-export works. The blob files get created during the walk through the master file list, before actual topo analysis is done. > The hard part comes when trying to correlate the metadata for all the > per-file revisions, and distill that into a consistent sequence/DAG of > changesets/commits across the entire CVS repo. And then, of course, > trying to fit all the branches and tags into that DAG of commits is > what really drives you mad... ;-) Well I know this...:-) > > The question is how common such supermassive cases are. My own guess is that > > the *BSD repos and a handful of the oldest GNU projects are pretty much the > > whole set; everybody else converted to Subversion within the last decade. > > You may be right. At least for the open-source cases. I suspect > there's still a considerable number of huge CVS repos within > companies' walls... If people with money want to hire me to slay those beasts, I'm available. I'm not proud, I'll use cvs2git if I have to. > > I find the very idea of writing anything that encourages > > non-history-correct conversions disturbing and want no part of it. > > > > Which matters, because right now the set of people working on CVS lifters > > begins with me and ends with Michael Rafferty (cvs2git), > > s/Rafferty/Haggerty/? Yup, I thinkoed. > > who seems even > > less interested in incremental conversion than I am. Unless somebody > > comes out of nowhere and wants to own that problem, it's not going > > to get solved. > > Agreed. It would be nice to have something to point to for people that > want something similar to git-svn for CVS, but without a motivated > owner, it won't happen. I think the fact that it hasn't happened already is a good clue that it's not going to. Given the decline curve of CVS usage, writing git-cvs might have looked like a decent investment of time once, but that era probably ended five to eight years ago. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a> -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html