On 6/16/06, Linus Torvalds <torvalds@xxxxxxxx> wrote:
On Fri, 16 Jun 2006, Jon Smirl wrote: > > I am spending over 40% of the time in the kernel. This looks to be > caused from forks and starting small tasks, is that the correct > interpretation? Yes. Your kernel profile is all for stuff related to setting up and tearing down process space (well, __mutex_lock_slowpath at 1.88% and __d_lookup at 1.3% is not, but every single one before that does seem to be about fork/exec/exit). I think it's both the CVS server that continually forks/exits (it doesn't actually do a exec at all - it seem sto be using fork/exit as a way to control its memory usage - knowing that the OS will free all the temporary memory on exit - I think the newer CVS development trees don't do this, but that also seems to be why they leak memory like mad and eventually run out ;).
I am using cvs-1.11.21-3.2 I can try running their development tree.
AND it's git-cvsimport forking and exec'ing git helper processes.
Is it worthwhile to make a library version of these? Svn has lib versions and they barely show up in oprofile. cvsimport is only using 4-5 low level git funtions.
So that process overhead is expected. What I would _not_ have expected is: > 933646 2.0983 /usr/local/bin/git-read-tree I don't see why git-read-tree is so hot for you. We should never need to read a tree when we're importing something, unless there are tons of branches and we switch back and forth between them. I guess mozilla really does use a fair number of branches?
Is 1,800 a lot?
Martin sent out a patch (that I don't think has been merged yet) to avoid the git-read-tree overhead when switching branches. Look for an email with a subject like "cvsimport: keep one index per branch during import", I suspect that would speed up the git part a lot.
I'll check this out
(It will also avoid a few fork/exec's, but you'll still have most of them, so I don't think you'll see any really _fundamental_ changes to this, but the git-read-tree overhead should be basically gone, and some of the libz.so pressure would also be gone with it. It should also avoid rewriting the index file, so you'd get lower disk pressure, but it looks like none of your problems are really due to IO, so again, that probably won't make much of a difference for you).
I have been CPU bound for two days, disk activity is minor. git-cvsimport is 250MB and I have 2GB of disk cache. After looking at this process for about a week it doesn't look like processing chronologically is the best strategy. cvsps can quickly work out the changesets, 15 minutes. Then it might be better to walk the CVS files one at a time generating git IDs for each revision. Next use the IDs and changeset info to build the git trees. Finally pack everything. This strategy would minimize the work load on the CVS files (adding all those delta to get random revs). Can git build a repository in this manner? If this is feasible it may be possible to do all of this in a single pass over the CVS tree by modifying cvsps. -- Jon Smirl jonsmirl@xxxxxxxxx - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html