Re: cvs import

"Jon Smirl" <jonsmirl@xxxxxxxxx> · Thu, 14 Sep 2006 13:17:43 -0400

On 9/14/06, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote:
Jon Smirl wrote:
> On 9/14/06, Jakub Narebski <jnareb@xxxxxxxxx> wrote:
>> Shawn Pearce wrote:
>>
>> > Originally I wanted Jon Smirl to modify the cvs2svn (...)
>>
>> By the way, will cvs2git (modified cvs2svn) and git-fast-import publicly
>> available?
>
> It has some unresolved problems so I wasn't spreading it around everywhere.
>
> It is based on cvs2svn from August. There has been too much change to
> the current cvs2svn to merge it anymore. [...]
>
> If the repo is missing branch tags cvs2svn may turn a single missing
> branch into hundreds of branches. The Mozilla repo has about 1000
> extra branches because of this.

[To explain to our studio audience:] Currently, if there is an actual
branch in CVS but no symbol associated with it, cvs2svn generates branch
labels like "unlabeled-1.2.3", where "1.2.3" is the branch revision
number in CVS for the particular file.  The problem is that the branch
revision numbers for files in the same logical branch are usually
different.  That is why many extra branches are generated.

Such unnamed branches cannot reasonably be accessed via CVS anyway, and
somebody probably made the conscious decision to delete the branch from
CVS (though without doing it correctly).  Therefore such revisions are
probably garbage.  It would be easy to add an option to discard such
revisions, and we should probably do so.  (In fact, they can already be
excluded with "--exclude=unlabeled-.*".)  The only caveat is that it is
possible for other, named branches to sprout from an unnamed branch.  In
this case either the second branch would have to be excluded too, or the
unlabeled branch would have to be included.

In MozCVS there are important branches where the first label has been
deleted but there are subsequent branches off from the first branch.
These subsequent branches are still visible in CVS. Someone else had
this same problem on the cvs2svn list. This has happen twice on major
branches.

Manually looking at one of these it looks like the author wanted to
change the branch name. They made a branch with the wrong name,
branched again with the new name, and deleted the first branch.

Alternatively, there was a suggestion to add heuristics to guess which
files' "unlabeled" branches actually belong in the same original branch.
 This would be a lot of work, and the result would never be very
accurate (for one thing, there is no evidence of the branch whatsoever
in files that had no commits on the branch).

You wrote up a detailed solution for this a few weeks ago on the
cvs2svn list. The basic idea is to look at the change sets on the
unlabeled branches. If change sets span multiple unlabeled branches,
there should be one unlabeled branch instead of multiple ones. That
would work to reduce the number of unlabeled branches down from 1000
to the true number which I believe is in the 10-20 range.

Would the dependency based model make these relationships more obvious?

Other ideas are welcome.

Michael

--
Jon Smirl
jonsmirl@xxxxxxxxx
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html