Re: I have end-of-lifed cvsps

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Thu, 19 Dec 2013 17:18:19 +0100

On 12/19/2013 04:26 PM, Johan Herland wrote:
> On Thu, Dec 19, 2013 at 10:31 AM, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote:
>> On 12/19/2013 02:11 AM, Johan Herland wrote:
>>> On Thu, Dec 19, 2013 at 12:44 AM, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote:
>>>> A correct incremental converter could be done (as long as the CVS users
>>>> don't literally change history retroactively) but it would be a lot of work.
>>>
>>> Although I agree with that sentence as it is stated, I also believe
>>> that the parenthesized condition rules out a _majority_ of CVS repo of
>>> non-trivial size/history. So even though a correct incremental
>>> converter could be built, it would be pretty much useless if it did
>>> not gracefully handle rewritten history. And in the face of rewritten
>>> history it becomes pretty much impossible to define what a "correct"
>>> conversion should even look like (not to mention the difficulty of
>>> actually implementing that converter...).
>>
>> A correct conversion would, conceptually, take a diff between the old
>> CVS history and the new CVS history (I'm talking about the history as a
>> whole, not a diff between two changesets), figure out what had changed,
>> and then figure out what Git commits to make to effect the same
>> conceptual changes in Git-land.
>>
>> This means that the final Git history would have to depend not only on
>> the current entirety of the CVS history, but also on what the CVS
>> history *was* during previous incremental imports and how the tool chose
>> to represent that history in Git the previous rounds.
>>
>> There is a tradeoff here.  The smarter the tool is, the fewer
>> restrictions would have to be made on what people can do in CVS.  For
>> example, it wouldn't be unreasonable to impose a rule that people are
>> not allowed to move files within the CVS repository (e.g., to fake
>> move-file-with-history) after the CVS <-> Git bridge is in use.  (Abuses
>> of the history that occurred *before* the first incremental conversion,
>> on the other hand, wouldn't be a problem.)  If the user of the
>> incremental tool has *no* influence on how his colleagues use CVS, then
>> the tool would have to be very smart and/or the user would might
>> sometimes be forced to do another from-scratch conversion.
> 
> Agreed, but I find it quite ugly how the git history will end up
> different depending on _when_ the incremental conversion is run. It
> means that it will be impossible for two users to create the same Git
> repo (matching SHA1s), unless they carefully synchronize all of their
> conversion runs

Even git-svn doesn't guarantee the same results over time.  The most
obvious scenario when it fails is when somebody changes an SVN commit's
metadata retroactively using something like "svn propedit --revprop
svn:log".  Consistency over time across two independent conversion
processes (that don't communicate) is not even theoretically possible.

> (at which point it's much simpler to run a single
> conversion and then have both users fetch the result).

Yes.  That is a very reasonable approach.

[Discussion of hypothetical real-time inode-watching or proxy-based
converter omitted here...]
> Agreed, but if you want correct metadata (_when_ did these changes
> happen, _who_ performed them), then you need to actually monitor the
> CVS command stream (or CVS server files) in real time...

In my opinion it is ridiculous to try to design a CVS <-> Git bridge
that tries to use back-channels to fill in historical data that even CVS
doesn't record.  Such a thing would require an intimate connection to
the CVS server from the IT department that is presumably blocking a real
move to Git.  So who would ever be able to use it?

The only reason to record extra information would be to enable the
bridge to do self-consistent incremental conversions, and in that case
the *only* extra information that has to be recorded is the information
that would have anyway landed in Git during the previous conversion.

>>> My point here is that people will use whatever available tools they
>>> have to solve whatever problems they are currently having. And when
>>> CVS is your tool, you will sooner or later end up with a "solution"
>>> that irrevocably rewrites your CVS history.
>>
>> Yes, but I maintain that an incremental importer could keep a Git
>> history that is consistent with the CVS history in the sense that:
>>
>> 1. the result of checking out any branch or tag, right after a run of
>> the importer, gives the same results as checking the same branch or tag
>> out of CVS.
>>
>> 2. the Git history from one run is added to (never rewritten) by the
>> next run.
> 
> Yes, and even my simplest/fastest possible converter described above
> can meet those criteria. After that, it really becomes a question of
> _how_much_ CVS history you want to retain in your incremental import.

I think you want enough history to make it pleasant to work with the
resulting Git repository.  That approximately means that you need some
semblance of the CVS commits to be reconstructed, with their correct
metadata, on the closest thing to their correct branches that is
consistent with the CVS - Git impedance mismatch.

> I have described the two extremes above. Interestingly, _both_ of
> those extremes would look quite different from the
> whole-history-gone-incremental converters represented by cvs2git and
> cvs-fast-export, and _both_ of the extremes would probably also
> provide a converted result quite a bit faster than anything in between
> (one by virtue of depending on a single "cvs update" command, and the
> other by monitoring the CVS server and performing the conversion to
> Git in real time).

I am not an extremist.  And I know how much work it would be to start a
project like this from scratch.  After all, what it can do should be a
strict superset of what a tool like cvs2git can do, and cvs2svn/cvs2git
(according to Ohloh's COCOMO estimate) contains the equivalent of 7
person-years of effort.

Anyway, this is all just blah blah unless somebody volunteers to work on
it.  And I think that is highly unlikely, especially given the
decreasing number of CVS repositories in the wild.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html