Re: git svn's performance issue and strange pauses, and other thing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(sorry about the last blank reply - mobile phone and finger accident...)

------------------------------
On Sun, Oct 19, 2014 05:12 BST Eric Wong wrote:

>Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx> wrote:
> The new clone has:
> 
> <--
> $ ls -ltr .git/svn/.caches/
> total 144788
> -rw-rw-r--. 1 Hin-Tak Hin-Tak  1166138 Oct  7 13:44 lookup_svn_merge.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 72849741 Oct  7 13:48 check_cherry_pick.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak  1133855 Oct  7 13:49 has_no_changes.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 73109005 Oct  7 13:53 _rev_list.yaml
> -->
> 
> The old clone has:
>
><snip>
> -rw-rw-r--. 1 Hin-Tak Hin-Tak  40241189 Oct  5 16:42 lookup_svn_merge.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 225323456 Oct  5 16:49 check_cherry_pick.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak    242547 Oct  5 16:49 has_no_changes.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak  24120007 Oct  5 16:50 _rev_list.yaml
> -->
> 
> I had to suspend somewhat around r59000 - but it is interesting to see
> that the max memory consumption of the later part is almost double?
> and it also runs at 100% rather than 60% overall; I don't know what
> to make of that - probably just smaller changes versus
> larger ones, or different time of day and network loads (yes,
> I guess it is just bandwidth-limited?, since the bulk of CPU time is in system
> rather than user).
>
>git-svn memory usage is insane, and we need to reduce it.
>(on Linux, fork() performance is reduced as memory size of the parent
> grows, and I don't think we can easily call vfork() from Perl)
>

Yes, I think the memory consumption is a bit crazy. I ran svn fetch on
the old again and it was a bit slow, so I timed the new, and here it is.
For just fetching 45 changes, it took 36 minutes and the memory 
consumption shoots up to over 1GB. (there was one or two mergeinfo
in the middle, not shown).

<---
cd ../R-2/
[Hin-Tak@localhost R-2]$ /usr/bin/time -v git svn fetch --all
	M	src/library/base/R/apply.R
	M	src/library/base/man/apply.Rd
	M	doc/NEWS.Rd
r66721 = e26e52bf4b2cdbe291d5899fd0a449f197aa2133 (refs/remotes/trunk)
...
	M	src/library/tools/R/utils.R
r66765 = c64d1828ada98395892529ce59b5760de1bdc60b (refs/remotes/R-3-1-branch)
---
	Command being timed: "git svn fetch --all"
	User time (seconds): 2042.81
	System time (seconds): 115.98
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 36:13.74
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 1019092
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 1149
	Minor (reclaiming a frame) page faults: 1482219
	Voluntary context switches: 9470
	Involuntary context switches: 226683
	Swaps: 0
	File system inputs: 358864
	File system outputs: 510680
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
[Hin-Tak@localhost R-2]$ cd ../R
--->


> I am somwhat worry about the dramatic difference between the two .svn/.caches -
> check_cherry_pick.yaml is 225MB in one and 73MB in the other, and also
> _rev_list.yaml is opposite - 24MB vs 73MB. How do I reconcile that?
>
>Calling patterns changed, and it looks like Jakob's changes avoided some
>calls.  The main thing to care about:
>    Does the repository history look right?
>

I'll check soon and report. I looks superficiently okay. I suppose
I'd need to check every branch to be sure. I know the fetch history is
different - but reflog (or the equivalent of it in svn) expires and are pruned
after two weeks?

>The check_cherry_pick cache can be made smaller, too:
>----------------------- 8< -----------------------------
>From: Eric Wong <normalperson@xxxxxxxx>
>Subject: [PATCH] git-svn: reduce check_cherry_pick cache overhead
>
>We do not need to store entire lists of commits, only the
>number of incomplete and the first commit for reference.
>This reduces the amount of data we need to store in memory
>and on disk stores.
>

Is there a way of retrospectively compress/trimming the cache, or better
still, examine it before compressing?

I intend to hold on to both the new and the old clone for a while until
I can reconcil the differences... though I am running the same git svn code
on both now.

>Signed-off-by: Eric Wong <normalperson@xxxxxxxx>
>---
> perl/Git/SVN.pm | 28 +++++++++++++++-------------
> 1 file changed, 15 insertions(+), 13 deletions(-)
>
>diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
>index 25dbcd5..b2d37cb 100644
>--- a/perl/Git/SVN.pm
>+++ b/perl/Git/SVN.pm
>@@ -1537,7 +1537,7 @@ sub _rev_list {
>     @rv;
> }
> 
>-sub check_cherry_pick {
>+sub check_cherry_pick2 {
>     my $base = shift;
>     my $tip = shift;
>     my $parents = shift;
>@@ -1552,7 +1552,8 @@ sub check_cherry_pick {
>             delete $commits{$commit};
>         }
>     }
>-    return (keys %commits);
>+    my @k = (keys %commits);
>+    return (scalar @k, $k[0]);
> }
> 
> sub has_no_changes {
>@@ -1597,7 +1598,7 @@ sub tie_for_persistent_memoization {
>         mkpath([$cache_path]) unless -d $cache_path;
> 
>         my %lookup_svn_merge_cache;
>-        my %check_cherry_pick_cache;
>+        my %check_cherry_pick2_cache;
>         my %has_no_changes_cache;
>         my %_rev_list_cache;
> 
>@@ -1608,11 +1609,11 @@ sub tie_for_persistent_memoization {
>             LIST_CACHE => ['HASH' => \%lookup_svn_merge_cache],
>         ;
> 
>-        tie_for_persistent_memoization(\%check_cherry_pick_cache,
>-            "$cache_path/check_cherry_pick");
>-        memoize 'check_cherry_pick',
>+        tie_for_persistent_memoization(\%check_cherry_pick2_cache,
>+            "$cache_path/check_cherry_pick2");
>+        memoize 'check_cherry_pick2',
>             SCALAR_CACHE => 'FAULT',
>-            LIST_CACHE => ['HASH' => \%check_cherry_pick_cache],
>+            LIST_CACHE => ['HASH' => \%check_cherry_pick2_cache],
>         ;
> 
>         tie_for_persistent_memoization(\%has_no_changes_cache,
>@@ -1636,7 +1637,7 @@ sub tie_for_persistent_memoization {
>         $memoized = 0;
> 
>         Memoize::unmemoize 'lookup_svn_merge';
>-        Memoize::unmemoize 'check_cherry_pick';
>+        Memoize::unmemoize 'check_cherry_pick2';
>         Memoize::unmemoize 'has_no_changes';
>         Memoize::unmemoize '_rev_list';
>     }
>@@ -1648,7 +1649,8 @@ sub tie_for_persistent_memoization {
>         return unless -d $cache_path;
> 
>         for my $cache_file (("$cache_path/lookup_svn_merge",
>-                     "$cache_path/check_cherry_pick",
>+                     "$cache_path/check_cherry_pick", # old
>+                     "$cache_path/check_cherry_pick2",
>                      "$cache_path/has_no_changes")) {
>             for my $suffix (qw(yaml db)) {
>                 my $file = "$cache_file.$suffix";
>@@ -1817,15 +1819,15 @@ sub find_extra_svn_parents {
>         }
> 
>         # double check that there are no missing non-merge commits
>-        my (@incomplete) = check_cherry_pick(
>+        my ($ninc, $ifirst) = check_cherry_pick2(
>             $merge_base, $merge_tip,
>             $parents,
>             @all_ranges,
>                );
> 
>-        if ( @incomplete ) {
>-            warn "W:svn cherry-pick ignored ($spec) - missing "
>-                .@incomplete." commit(s) (eg $incomplete[0])\n";
>+        if ($ninc) {
>+            warn "W:svn cherry-pick ignored ($spec) - missing " .
>+                "$ninc commit(s) (eg $ifirst)\n";
>         } else {
>             warn
>                 "Found merge parent ($spec): ",
>-- 
>EW


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]