Re: Significant performance waste in git-svn and friends

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Mike Hommey <mh@xxxxxxxxxxxx> wrote:
> Hi,

Hi Mike,

> Being a pervert abusing the way subversion doesn't deal with branches
> and tags, I'm actually not a user of git-svn or git-svnimport, because
> they just can't deal easily with my perversion. So I'm writing a script
> to do the conversion for me, and since I also like to learn new things
> when I'm coding, I'm writing it in ruby.
> 
> Anyways, one of the things I'm trying to convert is my svk repository
> for debian packaging of xulrunner (so, a significant subset of the
> mozilla tree), which doesn't involve a lot of revisions (around 280,
> because I only imported releases or CVS snapshots), but involves a lot
> of files (roughly 20k).
> 
> The first thing I noticed when twisting around the svk repo so that
> git-svn could somehow import it a while ago, is that running git-svn
> was in my case significantly slower than svnadmin dump | svnadmin load
> (more than 2 times slower).
> 
> And now, with my own script, I got the same kind of "slowdown". So I
> investigated it, and it didn't take long to realize that replacing
> git-hash-object by a simple reimplementation in ruby was *way* faster.
> git-hash-object being more than probably what you do the most when you
> import a remote repository, it is not much of a surprise that forking
> thousands of times is a huge performance waste.

I haven't looked at the times in a while, but I suspect that exec()
is the (much bigger) culprit.

Since I usually import off remote repositories, so I notice network
latency way before I notice local performance problems with git-svn.

> So, just for the record, I did a lame hack of git-svn to see what kind
> of speedup could happen in git-svn. You can find this lame hack as a
> patch below. I did some tests (with a 1.5.2.1 release) and here are the
> results, importing only the trunk (192 revisions), with no checkout, and
> redirecting stdout to /dev/null:
> 
> original git-svn:
> real    25m1.871s
> user    8m51.593s
> sys     12m31.659s
> 
> patched git-svn:
> real    14m45.870s
> user    7m31.928s
> sys     4m1.047s

That's awesome.

> - It might be worth testing if git-cat-file is called a lot. If so,
>   implementing a simple git-cat-file equivalent that would work for
>   unpacked objects could improve speed.

IIRC git-cat-file is called a lot.  Every modified file needs the
original cat-ed to make use of the delta.

> The same things obviously apply to git-cvsimport and other scripts
> calling git-hash-object a lot.

Making git-svn use fast-import would be very nice.  I've got a bunch
of other git-svn things that I need to work on, but having git-svn
converted to use fast-import would be nice.  Or allowing Git.pm
to access more of the git internals...

However, how well/poorly would fast-import work for incremental
fetches throughout the day?

-- 
Eric Wong
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux