On 6/14/06, Keith Packard <keithp@xxxxxxxxxx> wrote:
On Wed, 2006-06-14 at 10:55 +1200, Martin Langhoff wrote: > In terms of history parsing, parsecvs and cvs2svn are similar. I like > cvs2svn "many passes" approach better, though the Python source is > really messy. A good thing about cvs2svn is that it is a lot more > conservative WRT memory use. I will try to fix parsecvs so it doesn't take so much memory. Of course, my goal was to import various X.org repositories which have horrible issues, but aren't all that huge. And, for them, it works just fine.
Would it be possible to have it parse the RCS histories from a remote repo? I had forgotten, but that's something else that the cvsps + git-cvsimport combo can do. In short, to replace cvsps+git-cvsimport ... + not memory bound -- or at least must be able to import large (mozilla, gentoo) with a decent amount of memory + must work local and remote (of course local can be faster) + must do incrementals reasonably well
I'd like some help figuring out how to do incremental imports with parsecvs. As parsecvs already constructs the project history from the present into the past, it should be possible to "notice" when it hits existing bits in the repository and stop automatically. I think this will just take saving a bit of state in the git repository to mark where in CVS the tips of each branch come from.
Ok. Before starting to read the RCS files, I would look at all the branch tips in the git repo, and read some metadata of the last commit of each head into memory (author, commitmsg, timestamp, diffstat). When parsing RCS files and building changesets to import, compare them with the 'head' data. The timestamp granularity is seconds which is pretty coarse -- you can ask for history post those timestamps, but there's the risk of missing commits (this affects git-cvsimport today, and I'm thinking how to fix it there). So borderline changesets should be compared against the metadata you have. There is the chance that your earlier import caught a commit partway through, so you may end up putting in the 'rest' of the commit. That's why diffstat can be useful. Is that useful? cheers, martin - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html