Martin Langhoff wrote:
On 6/1/06, Alec Warner <antarus@xxxxxxxxxx> wrote:
After reading the whole thread on this, I've using a git checkout of
git, cvsps-2.1 and cvs-1.11.12, running overnight in verbose mode with
screen. Hopefully will have a repo in the morning ;)
Good stuff. I am rerunning it to prove (and bench) a complete an
uninterrupted import. So far it's done 4hs 30m, footprint grown to
207MB, 49750 commits. So I think it will be done in approx 30hs on
this single-cpu opteron.
Most commits are small, but there is a handful that are downright
massive -- and we hold all the file list in memory, which I think
explains (most of) the memory growth. I've looked into avoiding
holding the whole filelist in memory, but it involves rewriting the
cvsps output parsing loop, which is better left for a rainy day, with
a test case that doesn't take 30hs to resolve.
Ok the box this was running on had issues, so I switched to using
pearl.amd64.dev.gentoo.org, a dual core amd64 X2 4600+ with 4 gigs of
ram and plenty of disk. The "problem" now is just converstion time...30
hours and I'm into 2004-09-17...but it's been in 2004 all day, seems
like most of the commits are in the last three years. Are there
architectural issues with doing this in parallel?
Since the repository commits are all in cvs, it should be possible to do
the work in parallel, since you know what all the commits touch. The
concern would be ordering of nodes in the tree; you'd end up building a
bunch of subtrees and patching them together?
-Alec Warner
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html