Re: Fwd: git cvsimport implications

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Fri, 17 May 2013 17:28:56 +0200

On 05/17/2013 03:34 PM, Andreas Krey wrote:
> On Fri, 17 May 2013 15:14:58 +0000, Michael Haggerty wrote:
> ...
>> We both know that the CVS history omits important data, and that the
>> history is mutable, etc.  So there are lots of hypothetical histories
>> that do not contradict CVS.  But some things are recorded unambiguously
>> in the CVS history, like
>>
>> * The contents at any tag or the tip of any branch (i.e., what is in the
>> working tree when you check it out).
> 
> Except that the tags/branches may be made in a way that can't
> be mapped onto any commit/point of history otherwise exported,
> with branches that are done on parts of the trees first, or
> likewise tags.

This is true, but cvs2git nevertheless puts the required content on the
branch so that it checks out correctly.  In other words, a "CVS tag
creation" (which might not have been done a single point in time) is
done by cvs2git roughly like this (assume it is from master):

1. Make a list of all versions of all files that have to be in the tag.

2. When one of those file versions has to be overwritten (e.g., because
a later version of that file needs to be added to master), create a Git
tag-branch containing all of the files that are currently at the correct
version for the tag.  (It has to be a Git branch, not a tag, because we
might have to change it later.)

3. As other files on master go through the revisions needed for the tag,
create new commits on the tag-branch that add those revisions of those
files to the tag-branch.

At the end of the process, the tag-branch has the same contents as the
CVS tag, though it may have had to be created via multiple commits.

Currently, step 3 creates merge commits from master to the tag-branch.
This is sometimes what one would expect, sometimes not--a matter of
taste, really, because the CVS history is in this aspect more flexible
than what is representable in Git's history model.

> ...
>> That being said, I appreciate that cvsimport can do incremental imports.
>>  cvs2git doesn't even attempt it.  I've thought about what it would take
>> to implement correct incremental imports in cvs2svn/cvs2git, and it is
> 
> Do these two produce stable output? That is, return the same commits
> for multiple runs on the same repo?

It usually produces stable output, but not always.  I've had reports of
users using cvs2svn successfully as an "incremental importer" by simply
running the full import each time and relying on Git to match up the
overlapping part of the history simply because the SHA-1s are identical.
 But (1) the later conversions would be just as slow as the first, (2)
some of the heuristic decisions for grouping CVS file changes into Git
changesets can be affected by later commits, and (3) CVS history is
mutable; if the CVS history is changed retroactively in any way then it
won't work at all.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html