Re: Excruciatingly slow git-svn imports

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Geert Bosch <bosch@xxxxxxxxxxx> wrote:
> On Apr 29, 2008, at 03:11, Eric Wong wrote:
> 
> >>I've found that git-svn gets slower as it runs. Try interrupting the
> >>clone and running "git svn fetch" -- it should pick up where it left
> >>off and will be MUCH faster if my experience is any indication.
> >>When I clone the big svn repository at work I usually restart it
> >>every 1000 revisions or so and it finishes in a fraction of the time
> >>it takes if I let it do everything in a single run.
> >
> >That's really strange to hear...  The git-svn process itself does not
> >store much state other than the current revision and the log
> >information for the next 100 or so revisions it needs to import.
> >
> >Are you packing the repository?  Which SVN protocol are you using?
> >Does memory usage of git-svn stay stable throughout the run?
> 
> I found the same. After about 5 days (with maybe 10 break/restarts), I
> had a converted repository with all 135K commits and a total size of
> just under 1 GB. The last 100K commits took (much?) less than a day,
> almost  all the time was spend in the earlier ones. These commits
> seemed all to have  thousands of files, even though most were probably
> the same. I'm sure this  repositor, which covers 15 years of
> development of a multi-million line project,  has a lot of tags and it
> seemed that it just had to chew through many copies of the complete
> set of files to find out that they're all the same.

Interesting.  By  "These commits seemed all to have thousands of files",
you mean the first 35K that took up most of the time?  If so, yes,
that's definitely a problem...

git-svn requests a log from SVN containing a list of all paths modified
in each revision.  By default, git-svn only requests log entries for up
to 100 revisions at a time to reduce memory usage.  However, having
thousands of files modified for each revision would still be
problematic, as would having insanely long commit messages.

Is this repository public by any chance?  I'd like to be able to take
a look at it in case I have time and have access to decent hardware.
Also, what command-line arguments did you use?

> It's great git-svn can be restarted so well and doesn't get confused
> by uncleanly terminated runs. My final repository is fast and small.
> I'm still struggling with how to properly synchronize branches, but
> that probably is mostly a matter of user education.
> 
> Thanks all for these great tools.

You're welcome, thanks for the feedback!  Restartability in git-svn
is one of the things I focused on from the beginning.

-- 
Eric Wong
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux