Re: packs and trees

"Jon Smirl" <jonsmirl@xxxxxxxxx> · Tue, 20 Jun 2006 10:35:49 -0400

On 6/20/06, Martin Langhoff <martin.langhoff@xxxxxxxxx> wrote:
On 6/20/06, Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
> The plan is to modify rcs2git from parsecvs to create all of the git
> objects for the tree.

Sounds like a good plan. Have you seen recent discussions about it
being impossible to repack usefully when you don't have trees (and
resulting performance problems on ext3).

No, I will look back in the archives.  If needed we can do a repack
after each file is added. I would hope that git can handle a repack
when the new stuff is 100% deltas from a single file.

If I can't pack the exploded deltas need about 35GB disk space. That
is an awful lot to feed to pack all at once, but it will have trees,

> cvs2svn seems to do a good job at generating the trees.

No doubt. Gut the last stage, and use all the data in the intermediate
DBs to run a git import. It's a great plan, and if you can understand
that Python code... all yours ;-)

How hard would it be to adjust cvsps to use cvs2svn's algorithm for
grouping the changesets? I'd rather do this in a C app but I haven't
figured out the guts of parsecvs or cvsps well enough to change the
algorithms. There is no requirement to use external databases, sorting
everything in RAM is fine.

If you are interested in changing the cvsps grouping algorithm I can
look at moding it to write out the revisions as are they are parsed.
Then you only need to save the git sha1 in memory instead of the
file:rev when sorting.

> exactly sure how the changeset detection algorithms in the three apps
> compare, but cvs2svn is not having any trouble building changesets for
> Mozilla. The other two apps have some issues, cvsps throws away some
> of the branches and parsecvs can't complete the analysis.

Have you tried a recent parsecvs from Keith's tree? There's been quite
a bit of activity there too. And Keith's interested in sorting out
incremental imports too, which you need for a reasonable Moz
transition plan as well.

Keith's parsecvs run ended up in a loop and mine hit a parsecvs error
and then had memory corruption after about eight hours. That was last
week,  I just checked the logs and I don't see any comments about
fixing it.

Even after spending eight hours building the changeset info iit is
still going to take it a couple of days to retrieve the versions one
at a time and write them to git. Reparsing 50MB delta files n^2/2
times is a major bottleneck for all three programs.

--
Jon Smirl
jonsmirl@xxxxxxxxx
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html