Re: Why so much time in the kernel?

Keith Packard <keithp@xxxxxxxxxx> · Fri, 16 Jun 2006 11:02:05 -0700

On Fri, 2006-06-16 at 13:44 -0400, Jon Smirl wrote:

> I've been extracting versions from cvs and adding them to git now for
> 2.5 days and the process still isn't finished. It is completely CPU
> bound. It's just a loop of cvs co, add it to git, make tree, commit,
> etc.

To do all of mozilla using parsecvs (even with the quadratic algorithm)
takes about three hours on annarchy.freedesktop.org (two dual-core
Opteron with 4GB memory), including all conversion to packs. The pack
time is a tiny fraction of that.

> What about the cvs2svn algorithm described in the attachment? A ram
> based version could be faster. Compression could be acheived by
> switching from using the full path to a version to the sha1 for it.

Yes, parsecvs currently keeps everything in memory when doing the tree
conversion, which means it grows to a huge size to compute the full tree
of revisions. Computing git tree objects from the top down, then
computing commit objects from the bottom up should allow us to free most
of that during the full branch history computation process. I'm starting
a rewrite of parsecvs to try this approach and see how well it works.

If you've looked at the parsecvs source code, you'll notice it's a mess
at present; I started by attempting to do pair-wise tree merges in a
mistaken attempt to convert a linear term to log. Hacking that code into
its present form should be viewed more as a demonstration of how the
overall process can work, not as an optimal expression of the algorithm.

-- 
keith.packard@xxxxxxxxx
Attachment:
signature.asc

Description: This is a digitally signed message part