Some tips for doing a CVS importer

"Jon Smirl" <jonsmirl@xxxxxxxxx> · Mon, 20 Nov 2006 16:49:17 -0500

I have tried all of the available CVS importers. None of them are
without problems. If anyone is interested in writing one for git here
are some ideas on how to structure it.

1) there is a working lex/yacc for CVS in the parsecvs source code
2) The first time you parse a CVS file record everything and don't
parse it again.
3) When the file is first parsed use the deltas to generate the
revisions and feed them to git-fastimport, just remember the SHA1 or
an id in the import code. This is a critical step to getting decent
performance.
4) If you do #1 and #2 you don't need to store CVS revision numbers
and file names in memory. Because of that you can can easily do a
Mozilla import in 2GB, probably 1GB.
5) When comparing CVS revisions only use the CVS timestamps as a last
resort, instead use the dependency information in the CVS file
6) Match up commits by using an sha1 of the author and commit message
7) After all files are loaded, match up the symbols and insert them
into the dependency chains, if any of the symbols depend on a branch
commit the symbol lies on the branch, otherwise the symbol is on the
trunk,
8) Do a topological sort to build the change set commit tree
9) when you hit a loop in the tree break up delta change sets until
the loop can be removed, don't break up symbol change sets.
10) Mozilla has some large commits that were made over dial up. Commit
change sets can span hours. All of these commits need to be merged
into a single change set.
11) An algorithm needs to be developed for detecting branches merging
back into the trunk
12) cvs2svn has excellent test cases, use them to test the new
importer. The cvs2svn code is quite nice but it doesn't handle #7

--
Jon Smirl
jonsmirl@xxxxxxxxx
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html