Re: Why so much time in the kernel?

"Jon Smirl" <jonsmirl@xxxxxxxxx> · Fri, 16 Jun 2006 11:25:22 -0400

On 6/16/06, Linus Torvalds <torvalds@xxxxxxxx> wrote:

On Fri, 16 Jun 2006, Jon Smirl wrote:
>
> I am spending over 40% of the time in the kernel. This looks to be
> caused from forks and starting small tasks, is that the correct
> interpretation?

Yes. Your kernel profile is all for stuff related to setting up and
tearing down process space (well, __mutex_lock_slowpath at 1.88% and
__d_lookup at 1.3% is not, but every single one before that does seem to
be about fork/exec/exit).

I think it's both the CVS server that continually forks/exits (it doesn't
actually do a exec at all - it seem sto be using fork/exit as a way to
control its memory usage - knowing that the OS will free all the temporary
memory on exit - I think the newer CVS development trees don't do this,
but that also seems to be why they leak memory like mad and eventually run
out ;).

I am using cvs-1.11.21-3.2
I can try running their development tree.

AND it's git-cvsimport forking and exec'ing git helper processes.

Is it worthwhile to make a library version of these? Svn has lib
versions and they barely show up in oprofile. cvsimport is only using
4-5 low level git funtions.

So that process overhead is expected.

What I would _not_ have expected is:

>   933646  2.0983 /usr/local/bin/git-read-tree

I don't see why git-read-tree is so hot for you. We should never need to
read a tree when we're importing something, unless there are tons of
branches and we switch back and forth between them.

I guess mozilla really does use a fair number of branches?

Is 1,800 a lot?

Martin sent out a patch (that I don't think has been merged yet) to avoid
the git-read-tree overhead when switching branches. Look for an email with
a subject like "cvsimport: keep one index per branch during import", I
suspect that would speed up the git part a lot.

I'll check this out

(It will also avoid a few fork/exec's, but you'll still have most of them,
so I don't think you'll see any really _fundamental_ changes to this, but
the git-read-tree overhead should be basically gone, and some of the
libz.so pressure would also be gone with it. It should also avoid
rewriting the index file, so you'd get lower disk pressure, but it looks
like none of your problems are really due to IO, so again, that probably
won't make much of a difference for you).

I have been CPU bound for two days, disk activity is minor.
git-cvsimport is 250MB and I have 2GB of disk cache.

After looking at this process for about a week it doesn't look like
processing chronologically is the best strategy. cvsps can quickly
work out the changesets, 15 minutes. Then it might be better to walk
the CVS files one at a time generating git IDs for each revision. Next
use the IDs and changeset info to build the git trees. Finally pack
everything. This strategy would minimize the work load on the CVS
files (adding all those delta to get random revs).

Can git build a repository in this manner? If this is feasible it may
be possible to do all of this in a single pass over the CVS tree by
modifying cvsps.

--
Jon Smirl
jonsmirl@xxxxxxxxx
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html