Excruciatingly slow git-svn imports

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm trying to import a 9.7G, 130K revision svn repository
but it seems to only import about 6K revisions per day on fast hardware
using a recent git (1.5.5).

This means about 20 days, or more if things slow down as the repo gets bigger Are there any tips/tricks on how to most efficiently convert large repos?
I'm using ssh+svn protocol for accessing the repository, but slowness
seems due to local inefficiency. An strace -fcp <pid> during a minute gives
the following results:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 52.46   21.392640       17607      1215           clone
 47.47   19.358882        3983      4860      3645 execve
  0.05    0.019571          16      1216           wait4
  0.01    0.003944           0     14582      1215 open
  0.01    0.002458           0     14580     12150 access
  0.00    0.000797           0      8500           write
  0.00    0.000694           0     26013           read
  0.00    0.000574           0      3693           munmap
  0.00    0.000513           0     20659           close
  0.00    0.000452           0     21918           mmap
  0.00    0.000353           0      1215           stat
  0.00    0.000234           0     12158      1215 lseek
  0.00    0.000155           0     17013           fstat
  0.00    0.000077           0      6075           mprotect
  0.00    0.000076           0      8511           rt_sigaction
  0.00    0.000074           0      6078      6078 ioctl
  0.00    0.000049           0      2432           unlink
  0.00    0.000033           0      2430           dup2
  0.00    0.000033           0      7293           fcntl
  0.00    0.000022           0      3681           brk
  0.00    0.000022           0      1215           getppid
  0.00    0.000019           0      1215           uname
  0.00    0.000019           0      1215           arch_prctl
  0.00    0.000000           0      1215           lstat
  0.00    0.000000           0      1216           pipe
  0.00    0.000000           0        22           mremap
  0.00    0.000000           0      2431           dup
  0.00    0.000000           0      1215           getcwd
  0.00    0.000000           0      2430           getdents64
------ ----------- ----------- --------- --------- ----------------
100.00   40.781691                196296     24303 total

So, 99.93% of the time seems to be in clone/execve
(including actual work done by the forked programs)

In another trace, I found the following execve calls were made:
     22 execve("/homes/bosch/x86_64-linux/bin/git",
      2 execve("/homes/bosch/x86_64-linux/bin/git-commit-tree",
   2842 execve("/homes/bosch/x86_64-linux/bin/git-hash-object",
     22 execve("/opt/gnu/bin/git",
      2 execve("/opt/gnu/bin/git-commit-tree",
   2842 execve("/opt/gnu/bin/git-hash-object",
     22 execve("/opt/local/bin/git",
      2 execve("/opt/local/bin/git-commit-tree",
   2842 execve("/opt/local/bin/git-hash-object",
     22 execve("/opt/local/sbin/git",
      2 execve("/opt/local/sbin/git-commit-tree",
   2842 execve("/opt/local/sbin/git-hash-object",

I don't have git installed in either of /opt/gnu/bin, /opt/local/bin or /opt/local/sbin. These three directories just happen to be before the one containing git in my path:

bosch:~/git$ echo $PATH
/opt/gnu/bin:/opt/local/bin:/opt/local/sbin:/homes/bosch/x86_64-linux/ bin ...

Before trying to brush up my Perl and propose patching fixes for this
(I doubt the extra execve's take much time at all), I was wondering why
we don't open a single stream to git-fast-import and have it do
the heavy lifting. Are there fundamental issues with this?

  -Geert

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux