Re: [Monotone-devel] cvs import

Daniel Carosone <dan@xxxxxxxxxxx> · Thu, 14 Sep 2006 09:21:39 +1000

On Wed, Sep 13, 2006 at 03:52:00PM -0700, Nathaniel Smith wrote:
> This isn't trivial problem.  I think the main thing you want to avoid
> is:
>     1  2  3  4
>     |  |  |  |
>   --o--o--o--o----- <-- current frontier
>     |  |  |  |
>     A  B  A  C
>        |
>        A
> There are a lot of approaches one could take here, on up to pulling
> out a full-on optimal constraint satisfaction system (if we can route
> chips, we should be able to pick a good ordering for accepting CVS
> edits, after all).  A really simple heuristic, though, would be to
> just pick the file whose next commit has the earliest timestamp, then
> group in all the other "next commits" with the same commit message,
> and (maybe) a similar timestamp.  

Pick the earliest first, or more generally: take all the file commits
immediately below the frontier.  Find revs further below the frontier
(up to some small depth or time limit) on other files that might match
them, based on changelog etc (the same grouping you describe, and we
do now).  Eliminate any of those that are not entirely on the frontier
(ie, have some other revision in the way, as with file 2).  Commit the
remaining set in time order. [*]

If you wind up with an empty set, then you need to split revs, but at
this point you have only conflicting revs on the frontier i.e. you've
already committed all the other revs you can that might have avoided
this need, whereas we currently might be doing this too often).

For time order, you could look at each rev as having a time window,
from the first to last commit matching.  If the revs windows are
non-overlapping, commit them in order.  If the rev windows overlap, at
this point we already know the file changes don't overlap - we *could*
commit these as parallel heads and merge them, to better model the
original developer's overlapping commits.

> Handling file additions could potentially be slightly tricky in this
> model.  I guess it is not so bad, if you model added files as being
> present all along (so you never have to add add whole new entries to
> the frontier), with each file starting out in a pre-birth state, and
> then addition of the file is the first edit performed on top of that,
> and you treat these edits like any other edits when considering how to
> advance the frontier.

CVS allows resurrections too..

> I have no particular idea on how to handle tags and branches here;
> I've never actually wrapped my head around CVS's model for those :-).
> I'm not seeing any obvious problem with handling them, though.

Tags could be modelled as another 'event' in the file graph, like a
commit. If your frontier advances through both revisions and a 'tag
this revision' event, the same sequencing as above would work. If tags
had been moved, this would wind up with a sequence whereby commits
interceded with tagging, and we'd need to split the commits such that
we could end up with a revision matching the tagged content.

> In this approach, incremental conversion is cheap, easy, and robust --
> simply remember what frontier corresponded to the final revision
> imported, and restart the process directly at that frontier.

Hm. Except for the tagging idea above, because tags can be applied
behind a live cvs frontier.

--
Dan.
Attachment:
pgpkM5gRr4KbX.pgp

Description: PGP signature