Samuel Lucas Vaz de Mello wrote: > Michael Haggerty wrote: >> BTW, I don't want to trash "git cvsimport". I'm not brave enough even >> to try to implement incremental conversions in cvs2git. So the fact > > If I run cvs2git several times against a live cvs repo (using the > same configuration), wouldn't it perform an incremental import? > Is there anything that would make it produce different commits for > the history? > > I've just made a simple test here performing 2 imports (the 2nd with a > dozen of new commits not in the 1st) and it seemed to work fine. > > I know that it will take the same time/memory as the first import, > but is there something that can break the repository or produce wrong > data? Cool, I'd never thought of that. It's certainly not by design, but as you've discovered, the interaction of cvs2git and git *almost* combine to give you an incremental import. Alas, it is only "almost". There are many things that can happen in a CVS repository that would cause the overlapping part of the history to disagree between runs of cvs2svn. The nastiest are things that a VCS shouldn't really even allow, but are common in CVS, like - Retroactively adding a file to a branch or tag. (This is a much-beloved feature of CVS.) Since CVS doesn't record the timestamp when a symbol is added to a file, cvs2git tries (subject to the constraints of other timestamps) to group all such changes into a single changeset. So the creation of the symbol would look different in runs N vs N+1 of cvs2git--containing different files and likely with a different timestamp. - Renaming a file "with history" by renaming or copying the associated *,v file in the repository. This retroactively changes the entire history of that file and thus of all changesets that involved changes to that file. - Changing the "text vs binary" or keyword expansion mode of a file. These properties apply to all revisions of a file, and therefore also have a retroactive effect. But even aside from these retroactive changes, the output of cvs2git is not deterministic in any practical sense (though I've tried to make it deterministic given *identical* input). The problem is that there are so many ambiguities in a CVS history (because CVS doesn't record enough information) that cvs2git has to use heuristics to decide what individual file events should be grouped together as commits. The trickiest part is that the graph of naively inferred changesets can have cycles in it, and cvs2git uses several heuristics to decide how to split up changesets so as to remove the cycles. (See our design notes [1] for all the hairy details.) The CVS commits made between runs N and N+1 could easily change some of the heuristics' decisions, giving different results even for the overlapping part of the history. To add robust support for incremental commits to cvs2git would require run N+1 to know about the decisions made in run N, to avoid contradicting them. I wonder what would happen if one would treat the results of cvs2git conversions N and N+1 as two separate repositories and merge them using git. In many cases the merge would probably be trivial, and most conflicts (except retroactive file renaming!) would probably tend to be in the recent past and therefore resolvable manually. At least the repository shouldn't silently become corrupted, which can happen with other incremental conversion tools. The final problem is that cvs2git conversions of large CVS repositories are quite time-consuming, so using it for incremental conversions of large repositories would be painful. No doubt it could be speeded up considerably, especially if conversion N+1 was privy to the results of conversion N. These are all challenging problems and I would welcome volunteers and be happy to get them started. Michael [1] http://cvs2svn.tigris.org/svn/cvs2svn/trunk/doc/design-notes.txt -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html