On 6/20/06, Martin Langhoff <martin.langhoff@xxxxxxxxx> wrote:
On 6/20/06, Jon Smirl <jonsmirl@xxxxxxxxx> wrote: > The plan is to modify rcs2git from parsecvs to create all of the git > objects for the tree. Sounds like a good plan. Have you seen recent discussions about it being impossible to repack usefully when you don't have trees (and resulting performance problems on ext3).
No, I will look back in the archives. If needed we can do a repack after each file is added. I would hope that git can handle a repack when the new stuff is 100% deltas from a single file. If I can't pack the exploded deltas need about 35GB disk space. That is an awful lot to feed to pack all at once, but it will have trees,
> cvs2svn seems to do a good job at generating the trees. No doubt. Gut the last stage, and use all the data in the intermediate DBs to run a git import. It's a great plan, and if you can understand that Python code... all yours ;-)
How hard would it be to adjust cvsps to use cvs2svn's algorithm for grouping the changesets? I'd rather do this in a C app but I haven't figured out the guts of parsecvs or cvsps well enough to change the algorithms. There is no requirement to use external databases, sorting everything in RAM is fine. If you are interested in changing the cvsps grouping algorithm I can look at moding it to write out the revisions as are they are parsed. Then you only need to save the git sha1 in memory instead of the file:rev when sorting.
> exactly sure how the changeset detection algorithms in the three apps > compare, but cvs2svn is not having any trouble building changesets for > Mozilla. The other two apps have some issues, cvsps throws away some > of the branches and parsecvs can't complete the analysis. Have you tried a recent parsecvs from Keith's tree? There's been quite a bit of activity there too. And Keith's interested in sorting out incremental imports too, which you need for a reasonable Moz transition plan as well.
Keith's parsecvs run ended up in a loop and mine hit a parsecvs error and then had memory corruption after about eight hours. That was last week, I just checked the logs and I don't see any comments about fixing it. Even after spending eight hours building the changeset info iit is still going to take it a couple of days to retrieve the versions one at a time and write them to git. Reparsing 50MB delta files n^2/2 times is a major bottleneck for all three programs. -- Jon Smirl jonsmirl@xxxxxxxxx - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html