Re: import determinism

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/07/2010 09:25 PM, Enrico Weigelt wrote:
> I'm curious on how deterministic the imports (git-cvsimport and
> git-svn) are. Suppose I close the same cvs repo twice (assuming
> no write access in between), are the resulting object SHA-1's
> the same ?

On 11/09/2010 02:43 PM, Enrico Weigelt wrote:
> The point behind this is: I'm running a growing number of cvs2git
> mirrors and dont want to do full backups of them.

If you are using cvs2git, why are you asking about git-cvsimport and
git-svn?

No tool that imports from CVS or Subversion can make a blanket guarantee
about consistency across conversions because both CVS and SVN allow
retroactive changes to the project history.  For example:

* Both CVS and SVN allow commit messages and other metadata of old
commits to be changed

* CVS allows files to be added retroactively to tags and branches with
no timestamp indicating that the file was not part of the original tag.

* CVS allows old revisions to be "obsoleted" (i.e., expunged from history).

* In CVS it is common practice for people to muck about directly in the
repository, for example renaming *,v files.

So (in the general case) there is no way to guarantee that two
independent conversions will have consistent results for the overlapping
parts of their history.  And even incremental conversions will
necessarily have to decide between converting the current state of the
repository accurately and converting in a way that is consistent with
earlier conversions.

In practice, especially if you are willing to constrain what the CVS
users are allowed to do, the overlapping parts of two conversions should
usually be identical or at least very similar (with older history more
likely to be identical).  Perhaps an rsync-style backup would be smart
enough to copy only the changed part of the history without excluding
the possibility that there are retroactive changes between subsequent
conversions.

If you run two cvs2git conversions on *exactly* the same CVS repository,
then the results *should* be identical.  I have tried always to process
data in a defined order rather than, say, in filesystem or
hashmap-determined order.  But AFAIK this property has not been tested
and could easily be buggy if I overlooked some source of indeterminism
somewhere in the cvs2git code.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]