Re: I have end-of-lifed cvsps

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Thu, 19 Dec 2013 10:31:37 +0100

On 12/19/2013 02:11 AM, Johan Herland wrote:
> On Thu, Dec 19, 2013 at 12:44 AM, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote:
>> A correct incremental converter could be done (as long as the CVS users
>> don't literally change history retroactively) but it would be a lot of work.
> 
> Although I agree with that sentence as it is stated, I also believe
> that the parenthesized condition rules out a _majority_ of CVS repo of
> non-trivial size/history. So even though a correct incremental
> converter could be built, it would be pretty much useless if it did
> not gracefully handle rewritten history. And in the face of rewritten
> history it becomes pretty much impossible to define what a "correct"
> conversion should even look like (not to mention the difficulty of
> actually implementing that converter...).

A correct conversion would, conceptually, take a diff between the old
CVS history and the new CVS history (I'm talking about the history as a
whole, not a diff between two changesets), figure out what had changed,
and then figure out what Git commits to make to effect the same
conceptual changes in Git-land.

This means that the final Git history would have to depend not only on
the current entirety of the CVS history, but also on what the CVS
history *was* during previous incremental imports and how the tool chose
to represent that history in Git the previous rounds.

There is a tradeoff here.  The smarter the tool is, the fewer
restrictions would have to be made on what people can do in CVS.  For
example, it wouldn't be unreasonable to impose a rule that people are
not allowed to move files within the CVS repository (e.g., to fake
move-file-with-history) after the CVS <-> Git bridge is in use.  (Abuses
of the history that occurred *before* the first incremental conversion,
on the other hand, wouldn't be a problem.)  If the user of the
incremental tool has *no* influence on how his colleagues use CVS, then
the tool would have to be very smart and/or the user would might
sometimes be forced to do another from-scratch conversion.

> Here are just a couple of things a CVS user can do (and that happened
> fairly regularly at my previous $dayjob) that would make life
> difficult for an incremental converter (and that also makes stable
> output from a non-incremental converter hard to solve in practice):
> 
>  - A user "deletes" $file from $branch by simply removing the $branch
> symbol on $file (cvs tag -B -d $branch $file). CVS stores no record of
> this. Many non-incremental importers will see $file as never having
> existed on $branch. An incremental importer starting from a previously
> converted state, must somehow deal with that previous state no longer
> existing from the POV of CVS.

No problem; the tool could just add a synthetic commit "git rm"ming the
file from the branch.  It wouldn't know *when* the file was deleted, so
it would have to pick a plausible date between the time of the last
incremental conversion and the one that discovers that the branch tag
has been removed from the file.  The resulting Git history would contain
more complete information than CVS's history.

>  - A user moves a release tag on a few files to include a late bugfix
> into an upcoming release (cvs tag -F -r $new_rev $tag $file). There
> might be no single point in time where the tagged state existed in the
> repo, it has become a "Frankentag". You could claim user error here,
> and that such shortcuts should not happen, but that doesn't really
> prevent it from ever happening. Recreating the tree state of the
> Frankentag in Git is easy, but what kind of history do you construct
> to lead up to that tree?

Frankentags (tags that include file versions that didn't occur
contemporaneously) can occur even with one-time CVS->Git conversions.
The only way to handle them is to create a Git branch representing the
tag and base it at a plausible Git commit, and then (on the branch)
issue a fixup commit that makes the contents of the branch equal to the
contents of the CVS branch.  This is a problem that cvs2git already handles.

A hypothetical incremental importer would have to notice the changes in
the branch contents between the previous conversion and the current one,
and create commits on the branch to bring it in line with the current
contents.  This is no uglier than what a one-shot conversion already has
to do.

>  - A modularized project develops code on HEAD, and make regular
> releases of each module by tagging the files in the module dir with
> "$modulename-$version". Afterwards a project-wide "stable" tag is
> moved on that subset of files to include the new module release into
> the "stable" tag. ("stable" is conceptually a branch, but the CVS
> mechanism used here is still the tag, since CVS branches cannot
> "follow" eachother like in Git). This is pretty much the same
> Frankentag scenario as above, except that in this case it might be
> considered Best Practice (it was at our $dayjob), and not a
> shortcut/user error made by a single user.

Same problem and same solution as above, as far as I can see.

> (None of these examples even involve the "cvs admin" which allows you
> to do some truly scary and demented things to your CVS history...)

Even some of these might be permitted.  For example:

* Obsoleting already-converted revisions: it's a pretty stupid thing to
do in most cases and the tool could just ignore such events, retaining
the history in Git.  If the revisions were obsoleted because they
contained proprietary information or something, then you've got a bigger
problem on your hands but one that you would have even if you were using
pure Git.

* Retroactive changes to log messages: would probably have to be ignored
or handled via notes.

* Changes to the "default branch" (another brain-dead CVS feature
related to vendor branches): I'd have to think about it.  But handling
vendor branches is already difficult for a one-time converter because
CVS retains too little info (but cvs2git does it except in the most
ambiguous cases).  An incremental importer would have *more* information
than a one-shot importer, because it would have a hope of catching the
change to the default branch at roughly the time it occurred.

> My point here is that people will use whatever available tools they
> have to solve whatever problems they are currently having. And when
> CVS is your tool, you will sooner or later end up with a "solution"
> that irrevocably rewrites your CVS history.

Yes, but I maintain that an incremental importer could keep a Git
history that is consistent with the CVS history in the sense that:

1. the result of checking out any branch or tag, right after a run of
the importer, gives the same results as checking the same branch or tag
out of CVS.

2. the Git history from one run is added to (never rewritten) by the
next run.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html