Re: cvs2svn conversion directly to git ready for experimentation

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Fri, 03 Aug 2007 01:19:34 +0200

Linus Torvalds wrote:
> On Thu, 2 Aug 2007, Steffen Prohaska wrote:
>> Right now, I'd prefer the import by parsecvs because of the
>> simpler history. However, I don't know if I loose history
>> information by doing so. I'd start by a run of cvs2svn to validate
>> the overall structure of the CVS repository.
> 
> Well, once imported, you could just go through the branches and tags, and 
> just delete the ones you consider uninteresting, and then do a "git gc".
> 
> You'd want to re-pack after a fast-import anyway (regardless of the source 
> of the fast-import input), so maybe cvs2svn ends up giving you a bit 
> unnecessary info, but it should be easy enough to get rid of 
> after-the-fact.

The real goal is to get cvs2svn to include the useful information and
exclude the rest. :-)

I definitely want to address the problem of the helper branches used to
create tags.  This problem has has two aspects:

1. The helper branches should be deleted after the tag has been defined.
 I simply couldn't figure out how to do this using git-fast-import, and
git-fast-import complained when I tried to use a branch called
"TAG_FIXUP" without the "refs/head/" prefix.

2. The helper branch is not needed at all if an existing revision has
exactly the same contents as needed on the tag.  This requires cvs2svn
to keep a record of which files exist in the complete file tree on every
branch at every revision (which it can already do, though it is
expensive), and also to give it the smarts to choose the optimal tag
point (which it already does, except that it currently doesn't penalize
sources that require files to be deleted before making the tag).

If the problem is lots of seemingly-unnecessary merges involving a
vendor branch, then it is time for me or some other volunteer to add the
optimization of allowing branches to be grafted from the vendor branch
to trunk.  I know of the problem and have a good idea how to implement
it; it is just a matter of finding the time to get it done.

If the problem is unlabeled branches that can't be excluded (because
other branches or tags depend on them), then the real problem is that it
is not known which unlabeled branches in individual files correspond to
the same project-wide conceptual branch.  I have considered two
possibilities to improve this situation:

1. Allow unlabeled -- indeed any -- branches to be discarded even if
other branches or tags depend on them.  This could be done by
incorporating the content of the source revision (i.e., the revision on
the unlabeled branch that is going to be discarded) into the zeroth
revision of the daughter branch, then grafting the daughter onto the
branch from which the unlabeled branch sprouted.

2. Rename the unlabeled branches by figuring out which unlabeled branch
in fileA corresponds to which unlabeled branch in fileB, fileC, etc.
This would involve a tricky bit of matching file-wise dependency trees
onto one another to unify unlabeled branch labels, keeping in mind that:

  - The trees have other differences as well.
  - The unlabeled branch does not necessarily occur in every file.
  - There may be multiple unlabeled branches per file.

Michael
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html