Re: [PATCH] cvsimport: introduce -L<imit> option to workaround memory leaks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/26/06, Linus Torvalds <torvalds@xxxxxxxx> wrote:
I'm doing it too, just for fun.

Well, it's good to not be so alone in our definition of fun ;-)

Of course, since I'm doing this on a machine that basically has a laptop
disk, the "just for fun" part is a bit sad. It's waiting for disk about
25% of the time ;/

Ouch.

And it's slow as hell. I really wish we could do better on the CVS import
front.

Me too. However, I don't think the perl part is so costly anymore.
It's down to waiting on IO. git-write-tree is also prominently there.
It takes a lot of memory in some writes -- I had thought it'd be
cheaper as it takes one tree object at the time...

I also have a trivial patch that I haven't posted yet, that runs cvsps
to a tempfile, and then reads the file. Serialising the tasks means
that we don't carry around cvsps' memory footprint during the import
itself.

...
It's "git-rev-list --objects" that is the memory sucker for me, the
packing itself doesn't seem to be too bad.


No, you're right, it's git-rev-list that gets called during the
repack. But it was pushing everything it could to swap. Once it didn't
fit in memory, it hit a brick wall :(

The biggest cost seems to be git-write-tree, which is about 0.225 seconds
for me on that tree on that machine. Which _should_ mean that we could do
4 commits a second, but that sure as hell ain't how it works out. It seems
to do about 1.71 commits a second for me on that tree, which is pretty
damn pitiful. Some cvs overhead, and probably some other git overhead too.

Well, we _have_ to fetch the file. I guess you are thinking of
extracting if frrom the RCS ,v file directly? One tihng that I found
that seemed to speed things up a bit was to declare TMPDIR to be a
directory in the same partition.

(That's a 2GHz Merom, so the fact that you get ~6k commits per hour on
your 2GHz Opteron is about the same speed - I suspect you're also at least
partly limited by disk, our numbers seem to match pretty well).

Yup. This is _very_ diskbound.

200k commits at 6k commits per hour is about a day and a half (plus the
occasional packing load). Taking that long to import a CVS archive is
horrible. But I guess it _is_ several years of work, and I guess you
really have to do it only once, but still.

And it's a huge CVS archive too.



martin
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]