Re: Questions about git-fast-import for cvs2svn

Sean <seanlkml@xxxxxxxxxxxx> · Sun, 15 Jul 2007 12:01:49 -0400

On Sun, 15 Jul 2007 16:11:41 +0200
Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote:

Hi Michael,

Will take a stab at answering your questions...

> 1. Is it a problem to create blobs that are never referenced?  The
> easiest point to create blobs is when the RCS files are originally
> parsed, but later we discard some CVS revisions, meaning that the
> corresponding blobs would never be needed.  Would this be a problem?

Not a problem.  Running "git gc" later will cleanup any unused objects.

> 2. It appears that author/committer require an email address.  How
> important is a valid email address here?

It's not necessary for the operation of Git itself; it's up to you to
decide how important the information is to your project.  You should
be able to set an empty email address for author or committer in
git fast-import as "name <>".

>    a. CVS commits include a username but not an email address.  If an
> email address is really required, then I suppose the person doing the
> conversion would have to supply a lookup table mapping username -> email
> address.

Yes, take a look at the format supported by git-cvsimport and git-svnimport,
which can map each username into an appropriate name and email addy for Git.

>    b. CVS tag/branch creation events do not even include a username.
> Any suggestions for what to use here?

Perhaps just use your own username or one specifically created to
run the conversion process.

> 3. I expect we should set 'committer' to the value determined from CVS
> and leave 'author' unused.  But I suppose another possibility would be
> to set the 'committer' to 'cvs2svn' and the 'author' to the original CVS
> author.  Which one makes sense?

Another option is to just allow Git to set author and committer to the
same value.  As noted in the man page: "If author is omitted then
fast-import will automatically use the committer's information for
the author portion of the commit".

> 4. It appears that a commit can only have a single 'from', which I
> suppose means that files can only be added to one branch from a single
> source branch/revision in a single commit.  But CVS branches and tags
> can include files from multiple source branches and/or revisions.  What
> would be the most git-like way to handle this situation?  Should the
> branch be created in one commit, then have files from other sources
> added to it in other commits?  Or should (is this even possible?) all
> files be added to the branch in a single commit, using multiple "merge"
> sources?

Git supports the ability to merge from multiple branches at once (known
as an octopus merge).  So it's possible to start a new branch, drawing
in files from more than one source branch in a single commit.  As i
understand it, fast-import allows only a single "from" line for a commit,
but allows multiple "merge" lines for additional parentage info. 

> 5. Is there any significance at all to the order that commits are output
> to git-fast-import?  Obviously, blobs have to be defined before they are
> used, and '<committish>'s have to be defined before they are referenced.
>  But is there any other significance to the order of commits?

Don't think so, except perhaps for the packfile optimization issues
mentioned in the man page.

HTH,
Sean
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html