On 12/22/2012 06:36 PM, Eric S. Raymond wrote: > * One is Michael Haggerty's cvs2git. I had bad experiences with the > cvs2svn code it's derived from in the past, but Michael believes those > problems have been fixed and I will accept that - at least until I can > test for myself. Its documented interface is not quite good enough > yet; as the documentation says, "The data that should be fed to git > fast-import are written to two files, which have to be loaded into git > fast-import manually." There are two good reasons that the output is written to two separate files: 1. The files are generated during different passes of cvs2git, and since the cvs2git conversion is restartable pass-by-pass, the first file might only need to be generated once even while the user is iterating on adjustments to other conversion options. 2. The first ("blobfile") contains blob definitions for file revisions, which are read out of the RCS files in the order they are held in the RCS file. This is vastly faster than reading the file revisions in the order that they are needed for git commits because (1) all revisions for a file can be computed from one serial read of the RCS file; (2) there is no need to jump around from rcsfile to rcsfile. The second ("dumpfile") stitches the blobs together into git commits by referring to the blobs that are needed. This file is smaller because it doesn't contain the actual file contents. Another advantage of this approach is that a blob need only appear once in the blobfile even if it is used multiple times in the git history. Anyway, surely cat'ing two output files together is not such a difficult problem? A potentially bigger problem is that if you want to handle such blob/dump output, you have to deal with git-fast-import format's "blob" command as opposed to only handling inline blobs. However, if that is a problem, it is possible to configure cvs2git to write the blobs inline with the rest of the dumpfile (this mode is supported because "hg fast-import" doesn't support detached blobs). You would have to create an options file that uses GitRevisionInlineWriter, similar to what is done in cvs2hg-example.options. > [...] > Having three different tools for this job seems to me duplicative and > pointless; two of them should probably be let die an honorable death. > I don't actually care which of the three survives - and, in > particular, if I determine that cvs2git is doing the best job of the > three I am quite willing to declare end-of-life for cvsps and > parsecvs. It's not like I don't have plenty of other projects to work > on. cvs2git does not currently support incremental conversions; therefore, a cvsps-based option (if it would actually work, that is) would have at least one advantage over cvs2git. > I presently know of three test suites other than mine. One was built > by Heiko to test cvsps, another lives in the git t/ directory, and the > third is cvs2git's. I haven't looked at cv2git's yet, but the others > are not in their present form suited to where I am taking cvsps and > parsecvs. Heiko's relies on the default human-readable cvsps format, > which I consider obsolete and uninteresting. The git tests are > dependent on details of porcelain behavior. I think it would be > better to test import-stream output. cvs2svn has an extensive test suite which includes tests derived from bug reports that we have received over the years. I adapted a few of its test repositories to create the git test suite additions that I made in Feb 2009, but there are many more in our project. A lot of our test suite deals with additional conversion features, like: * Re-encoding filenames, usernames, and log messages from whatever happens to have been used in the CVS repository into UTF-8 * Fixing CVS branches, tags, and mixed branch/tag messes according to user wishes; renaming branches and tags * Allowing the user to influence the choice of which branch should serve as the source for another branch/tag (CVS records this information very ambiguously) * Fixing binary vs. text files, expanding/contracting CVS keywords, etc. * Removing lots of synthetic revisions and other cruft generated by CVS to fit within the RCS file format * Dealing with vendor branches in a sensible way, especially considering that very many users misuse vendor branches for initial imports * Dealing with various common types of CVS repository corruption See our list of features [1] for more details. Presumably many of these features would not be covered by your test framework, and are not supported by the other conversion tools. Unfortunately, our tests are mostly based on cvs2svn (i.e., not 2git); that is, the conversion is done with cvs2svn and checked by verifying the contents of the resulting Subversion repository. The script contrib/verify-cvs2svn.py is another kind of test; it checks every branch and tag out of CVS and the destination repository and verifies that their contents are identical. This script is intended to be used by users to check their own conversion. Please note that it doesn't check the history, only the branch/tag tips. But this script works with both Subversion and git (at least it should; it probably doesn't get tested much). > Here is what I propose. Let's build a common test suite that cvs2git, > git-cvsimport, cvsps, and parsecvs can all use, apply it rigorously, > and let the best tool win. (This would mean, among other things, that > git can stop carrying things that are essentially cvsps tests in its > tree.) I think it would be great to have a way to test across tools, though please realize that the inference of the most plausible "true" CVS history is partly objective but also often a matter of heuristics and taste. Moreover, the choice of how to represent the inferred history in git, which has rather a different model than CVS/Subversion, is also non-obvious and somewhat controversial. I expect that there will be a number of simple CVS repositories for which we can all agree about the correct git output, but not far away will be a vast number for which the "correct" answer is unclear. Many of the interesting tests would fall into the latter category. > The two people I most need to sign off on this are, I guess, Michael > Haggerty and either Junio Hamano or whoever specifically owns > git-cvsimport and its tests. [...] It's not clear what you want me to sign off on. I guess you want to replace (or augment?) the cvs2svn test suite with one based on your framework? Right off the top of my head I can think of a few considerations from the point of view of the cvs2svn project: * We definitely want to continue testing the Subversion output of cvs2svn. A test suite that only tests the git output could at best be an addition to the current test suite, not a replacement for it. (That being said, the addition of good tests of the 2git output would be great.) * A test suite that tests only the easy cases wouldn't really be interesting, because the difficult cases are where the potential problems lie. * It would be unfortunate if the cvs2svn test suite would grow another run-time dependency or if we would have to invest a lot of time synchronizing with another project, though if the gain were big enough we could consider it. * The licenses obviously have to be compatible to the extent required by the level of coupling. * I don't have a lot of time to work on the integration. cvs2svn has long been at a level of maturity where it doesn't need much care and feeding, and I would like to keep it that way :-) Nowadays I am far more interested in working on the git project with my little available open-sourcin' time. Rereading this email, I realize that it is not clear to me why your new testing project needs the "signoff" or cooperation from any of the conversions tool projects (git-cvsimport or cvs2svn or parsecvs or ...) in the first place. The essence of your project will be a collection of CVS test repositories, and code that can read the conversion output (whether via git or as fast-input data) and verify that it matches expectations (right?). Presumably it will have a place where any of the conversion tools could be plugged into it, and perhaps a bit of code that knows how to configure and run the best-known tools (and perhaps even to download and build them). It would seem natural to me that your project stops there, and stays at arms-length from the conversion projects. If your test suite proves itself to be obviously better than the cvs2svn test suite, then we might try to integrate it *then* (or not; even then it wouldn't really be obligatory). Michael [1] http://cvs2svn.tigris.org/features.html -- Michael Haggerty mhagger@xxxxxxxxxxxx http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html