On 5/26/06, Linus Torvalds <torvalds@xxxxxxxx> wrote:
I'm doing it too, just for fun.
Well, it's good to not be so alone in our definition of fun ;-)
Of course, since I'm doing this on a machine that basically has a laptop disk, the "just for fun" part is a bit sad. It's waiting for disk about 25% of the time ;/
Ouch.
And it's slow as hell. I really wish we could do better on the CVS import front.
Me too. However, I don't think the perl part is so costly anymore. It's down to waiting on IO. git-write-tree is also prominently there. It takes a lot of memory in some writes -- I had thought it'd be cheaper as it takes one tree object at the time... I also have a trivial patch that I haven't posted yet, that runs cvsps to a tempfile, and then reads the file. Serialising the tasks means that we don't carry around cvsps' memory footprint during the import itself. ...
It's "git-rev-list --objects" that is the memory sucker for me, the packing itself doesn't seem to be too bad.
No, you're right, it's git-rev-list that gets called during the repack. But it was pushing everything it could to swap. Once it didn't fit in memory, it hit a brick wall :(
The biggest cost seems to be git-write-tree, which is about 0.225 seconds for me on that tree on that machine. Which _should_ mean that we could do 4 commits a second, but that sure as hell ain't how it works out. It seems to do about 1.71 commits a second for me on that tree, which is pretty damn pitiful. Some cvs overhead, and probably some other git overhead too.
Well, we _have_ to fetch the file. I guess you are thinking of extracting if frrom the RCS ,v file directly? One tihng that I found that seemed to speed things up a bit was to declare TMPDIR to be a directory in the same partition.
(That's a 2GHz Merom, so the fact that you get ~6k commits per hour on your 2GHz Opteron is about the same speed - I suspect you're also at least partly limited by disk, our numbers seem to match pretty well).
Yup. This is _very_ diskbound.
200k commits at 6k commits per hour is about a day and a half (plus the occasional packing load). Taking that long to import a CVS archive is horrible. But I guess it _is_ several years of work, and I guess you really have to do it only once, but still.
And it's a huge CVS archive too. martin - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html