On 11/19/06, Petr Baudis <pasky@xxxxxxx> wrote:
On Sun, Nov 19, 2006 at 06:40:06PM CET, Jon Smirl wrote: > Brendan told me that he would not consider Mozilla moving to git until > a native Windows version is released so I just dropped the whole > thing. It is too much effort and they don't even really want it. They > are probably going to switch to SVN. I told him that SVN would end up > being a disaster and he got mad at me. That's when I stopped working > on cvs2svn/git. I see. :-( Could you please publish cvs2git in whatever state you have it so that someone else possibly interested could pick it up and finish the missing bits? It would be shame if the already done work would end up wasted.
Working on cvs2git is the wrong direction. cvs2svn has been specifically tuned for importing into SVN and they aren't interested in making the architectural changes that git needs. There is a core problem in the way cvs2svn handles CVS symbols.
I posted a three commit example of the problem. FileA has rev 1.1 and rev 1.2 FileB has rev 1.1 A symbol is created between A 1.1/1.2 and after B 1.1 cvs2svn generated two change sets 1) FileA 1.1 2) FileB 1.1, FileA 1.2 With these two change sets it is impossible to base the symbol simply, there is no solution without copying. cvs2svn generates the symbol based on #1 and copies FileB 1.1 from #2. The alternative is to output three change sets which is also a valid representation of the same data. 1) FileA 1.1 2) FileB 1.1 3) FileA 1.2 Now there is a solution for the symbol, base is on change set #2. No copies needed. By introducing symbol dependencies (which cvs2svn does not do) you can force the second change set sequence to be generated.
SVN allows a label to be made by picking a commit from six months ago with 80% of the right files in it. They then link to revisions from later commits to build up the file set needed for the label. Doing it this way turns every symbol into a little branch. It is fixing the symptoms rather than fixing the problem of figuring out the correct construction of the change sets. The SVN people seem to be perfectly happy with these little branches and they aren't going to change cvs2svn. cvs2svn is a nice piece of code and a good thing to look at for reference. It includes some excellent test cases. The author is a Python expert and uses every last feature of the language which makes to code hard to understand at times. He also loves to refactor things and does so continuously. This is a major problem for any long lived patches. If you do choose to work on cvs2svn, just fork it until you are done developing, don't try to track the refactorings. There are several other choices. Monotone is getting a pretty good importer and the author is aware of the problem described above and is writing his code to avoid it. Monotone is in C++ and make heavy use of the Boost C++ template library. Because of this I can't tell what their code does, it doesn't look like C++ anymore. git is in C and has high code quality standards. I would just start from scratch and write the importer over while referencing the existing code. If anyone is interested in doing this I will be happy to explain what I know about doing the import accurately and quickly. With the right algorithms it is possible to import Mozilla CVS with 2GB memory in under an hour on a desktop machine. Shawn has written git-fastimport. fastimport takes an input stream of a simple language and then creates the git repository. The CVS importer just needs to generate these commands. With the current knowledge of issues around doing CVS import this problem is not as hard as it used to be. Dozens of attempts have been made at this problem, Shawn and I have looked at all of these and know where the algorithmic mistakes are. The only big outstanding problem I know of with import is the issue described above which no one has coded a solution for yet. Once a solution has been found for this problem the next problem is detecting when a branch gets merged back into the trunk. After that I think the problem is fully solved. -- Jon Smirl jonsmirl@xxxxxxxxx - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html