On 9/14/06, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote:
Jon Smirl wrote: > On 9/14/06, Jakub Narebski <jnareb@xxxxxxxxx> wrote: >> Shawn Pearce wrote: >> >> > Originally I wanted Jon Smirl to modify the cvs2svn (...) >> >> By the way, will cvs2git (modified cvs2svn) and git-fast-import publicly >> available? > > It has some unresolved problems so I wasn't spreading it around everywhere. > > It is based on cvs2svn from August. There has been too much change to > the current cvs2svn to merge it anymore. [...] > > If the repo is missing branch tags cvs2svn may turn a single missing > branch into hundreds of branches. The Mozilla repo has about 1000 > extra branches because of this. [To explain to our studio audience:] Currently, if there is an actual branch in CVS but no symbol associated with it, cvs2svn generates branch labels like "unlabeled-1.2.3", where "1.2.3" is the branch revision number in CVS for the particular file. The problem is that the branch revision numbers for files in the same logical branch are usually different. That is why many extra branches are generated. Such unnamed branches cannot reasonably be accessed via CVS anyway, and somebody probably made the conscious decision to delete the branch from CVS (though without doing it correctly). Therefore such revisions are probably garbage. It would be easy to add an option to discard such revisions, and we should probably do so. (In fact, they can already be excluded with "--exclude=unlabeled-.*".) The only caveat is that it is possible for other, named branches to sprout from an unnamed branch. In this case either the second branch would have to be excluded too, or the unlabeled branch would have to be included.
In MozCVS there are important branches where the first label has been deleted but there are subsequent branches off from the first branch. These subsequent branches are still visible in CVS. Someone else had this same problem on the cvs2svn list. This has happen twice on major branches. Manually looking at one of these it looks like the author wanted to change the branch name. They made a branch with the wrong name, branched again with the new name, and deleted the first branch.
Alternatively, there was a suggestion to add heuristics to guess which files' "unlabeled" branches actually belong in the same original branch. This would be a lot of work, and the result would never be very accurate (for one thing, there is no evidence of the branch whatsoever in files that had no commits on the branch).
You wrote up a detailed solution for this a few weeks ago on the cvs2svn list. The basic idea is to look at the change sets on the unlabeled branches. If change sets span multiple unlabeled branches, there should be one unlabeled branch instead of multiple ones. That would work to reduce the number of unlabeled branches down from 1000 to the true number which I believe is in the 10-20 range. Would the dependency based model make these relationships more obvious?
Other ideas are welcome. Michael
-- Jon Smirl jonsmirl@xxxxxxxxx - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html