Re: Problem with git-cvsimport

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Wed, 31 Oct 2007 05:42:09 +0100

Mike Snitzer wrote:
> On 10/10/07, Eyvind Bernhardsen <eyvind-git-list@xxxxxxxxxxxxxx> wrote:
> ...
>> Thanks for making cvs2svn the best CVS-to-git conversion tool :)  Now
>> if it would only support incremental importing...
> 
> I second this question: is there any chance incremental importing will
> be implemented in cvs2svn?

Unfortunately, no, there is not much chance that I will implement this.
 I wouldn't be interested in a works-most-of-the-time solution, and a
reliable solution would take weeks to implement.

If somebody else wants to implement this feature, I would be happy to
help him get started, answer questions, discuss the design, etc.  Or if
somebody wants to sponsor the work, I might be able to justify working
on it myself.  But otherwise, I'm afraid it is unlikely to happen.

> I've not used cvs2svn much and when I did it was for svn not git; but
> given that git-cvsimport is known to mess up your git repo (as Eyvind
> pointed out earlier) there doesn't appear to be any reliable tools to
> allow for incrementally importing from cvs to git.

That's because it is quite a tricky problem, especially since CVS allows
history to be changed retroactively; for example,

- shift a tag to a different file revision

- add an existing tag to a new file or remove it from an old file

- delete ("obsolete") old revisions

- change files from vendor branches to main line of development

- even nastier server-side repository manipulations like deleting an RCS
file, renaming a file, etc.

These things really happen in the topsy-turvy CVS world; indeed, they
are a part of many organizations' standard workflow.

cvs2svn uses repository-wide information in the heuristics that it uses
to determine changesets, choose branch parents, fix clock skew, etc.
Therefore the naive approach of running a full conversion a second time
and just skipping over the revisions that were handled during the first
conversion would not even begin to work.  (I believe that this is the
approach of cvsps, which uses mostly local information to determine
changesets.)

I think the correct approach would involve recording the "frontier" of
the CVS repository, then at the next incremental conversion:

1. compare the current CVS repository to the recorded information

2. emit "fixup" changesets to reflect any CVS changes that happened
behind the previous "frontier".

3. emit changesets to reflect CVS changes beyond the frontier.

It is step 2 that is IMO the trickiest because it is so open-ended, and
modern SCMs don't allow all of the corresponding operations in any
straightforward way.  Presumably one would have to prohibit some of the
nastier CVS tricks and abort the incremental conversion if any are detected.

Furthermore, for many use-cases of incremental conversion the conversion
would have to run quickly.  Therefore, the incremental conversion code
should be written with a strong emphasis on achieving good performance.

Michael
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html