On 11/27/06, Petr Baudis <pasky@xxxxxxx> wrote:
On Mon, Nov 27, 2006 at 05:13:10PM CET, Jon Smirl wrote: > The SVN version of the Mozilla repository is about 3GB. It takes > around a week of CPU time for svnimport to process it. Is there a reason why a SVN importer would _have_ to take _longer_ than a CVS importer? I'd expect the opposite from an optimized importer since you don't have to guess the changesets...
These import programs take forever because they fork off git, SVN or CVS millions of times. It really does take a week to fork a CVS process that many times. It's not the application code that is taking a week to run, it is the millions of forks. As was mentioned in the thread about doing CVS to git import, the trick is to write your own CVS file parser, parse the file once (not once for each revision) and output all of the revisions to the git database in a single pass. When code is structured that way I can import the whole Mozilla repository into git in two hours. The fast-import back end also works with out forking, it just listens to command and stdin and acts on them, all of the commands are implement in a single binary. The speed of fork in Linux is fine for most purposes, but it is not fine if you are going to fork off good sized apps several million times. When I measured those forks in oprofile, 60% of the CPU was being consumed by the kernel. -- Jon Smirl jonsmirl@xxxxxxxxx - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html