Re: [PATCH] Optimize rename detection for a huge diff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> On Mon, 28 Jan 2008, Jeff King wrote:
>> 
>> I tried to reproduce this, but my peak heap allocation was only around
>> 20MB. Is your repository fully packed? Not packed at all? Can you use
>> valgrind/massif to figure out where the memory is going?
>
> I definitely can reproduce it, it's horrid.
>
> This is from "top" fairly late in the game, but with the thing not even 
> done yet. Current git, pretty much fully (and fairly aggressively) packed 
> current kernel repo, and using "diff.renamelmit=0".
>
> 	4751 torvalds  20   0  852m 446m  47m R   72 22.4   2:46.58 git-merge-recur
>
> It finally finished with time reporting:
>
> 	208.15user 3.50system 4:01.50elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
> 	238736inputs+4544outputs (8261major+280971minor)pagefaults 0swaps
>
> where those 280971 minor page faults are what largely indicates how much 
> memory it used (the technical term for that number is "metric buttload of 
> memory").

With a bit of tweak, now I am getting these numbers to the
rename detection that used to spend 800MB (the peak I observed
was somewhere around 430MB).

(after patch, in the kernel repository, master at 96b5a46)
$ /usr/bin/time git-diff -M -l0 --name-status d19fbe8a7 master >/var/tmp/3
157.20user 1.03system 2:38.72elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (32major+237315minor)pagefaults 0swaps

$ /usr/bin/time git-diff -M -l0 --name-status d19fbe8a7 master >/var/tmp/4
174.00user 2.73system 3:09.55elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6106major+459314minor)pagefaults 0swaps

So it is not that much of an improvement, but it seems to help
somewhat.

The first hunk is about shrinking the diff_score structure;
before the patch, it was O(NxM) where N and M are number of
rename source and destination candidates, but after the patch it
is now O(M), so this shrinkage should not matter, but score is
capped to MAX_SCORE (60000) and name_score is actually 0 or 1.
We cannot make it 1-bit unsigned bitfield as there is a qsort
comparison callback that does (b->name_score - a->name_score).

We would need to see where the remaining 400MB is going and try
to shrink it, but this would be an improvement so I'll soon be
moving this to 'next'.

---

 diffcore-rename.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 90d06f0..99953e7 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -112,8 +112,8 @@ static int basename_same(struct diff_filespec *src, struct diff_filespec *dst)
 struct diff_score {
 	int src; /* index in rename_src */
 	int dst; /* index in rename_dst */
-	int score;
-	int name_score;
+	unsigned short score;
+	short name_score;
 };
 
 static int estimate_similarity(struct diff_filespec *src,
@@ -393,7 +393,7 @@ static int find_exact_renames(void)
 	return i;
 }
 
-#define NUM_CANDIDATE_PER_DST 8
+#define NUM_CANDIDATE_PER_DST 4
 static void record_if_better(struct diff_score m[], struct diff_score *o)
 {
 	int i, worst;
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux