Re: git-revert is a memory hog

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <gitster@xxxxxxxxx> writes:

> Jeff King <peff@xxxxxxxx> writes:
>
>> On Wed, Jan 30, 2008 at 08:51:09AM +1100, Linus Torvalds wrote:
>>
>>> I definitely can reproduce it, it's horrid.
>>> 
>>> This is from "top" fairly late in the game, but with the thing not even 
>>> done yet. Current git, pretty much fully (and fairly aggressively) packed 
>>> current kernel repo, and using "diff.renamelmit=0".
>>
>> Hrm, setting diff.renamelimit to 0 lets me reproduce (I thought I tried
>> it before, but clearly not...).

How about this fairly obvious patch?

We used to record, for N=src deleted paths and M=dst created
paths, (N x M) "struct diff_score" that records how similar each
of the pair is, and picked the <src,dst> pairs that gives the
best match first, and then went on to worse matches.  This
sorting is so that two destinations are both found to be similar
to a single source, we can process the more similar one first,
and when processing the second one, it can notice "Ah, the
source I was planning to say I am a copy of is already taken by
somebody else" and continue on to match himself with another
source with a lessor match (this matters to a change introduced
between 1.5.3.X series and 1.5.4-rc, that lets the code to favor
unused matches first and then falls back to using already used
matches).

This instead allocates and keeps only M records in core.  For
each dst, we compute similarlity with all sources (so the number
of similarity estimate computations we do is still N x M), but
we keep the best src for each dst.  This is essentially to save
memory drastically by giving up to come up with better pairing.

I guess we could keep a handful best candidates per dst, instead
of just one, to further improve on this approach, and such a
change should be fairly straightforward.

---
 diffcore-rename.c |   46 +++++++++++++++++++++++-----------------------
 1 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 3d37725..7e8fdcd 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -473,42 +473,42 @@ void diffcore_rename(struct diff_options *options)
 	if (num_create * num_src > rename_limit * rename_limit)
 		goto cleanup;
 
-	mx = xmalloc(sizeof(*mx) * num_create * num_src);
+
+	mx = xmalloc(sizeof(*mx) * num_create);
 	for (dst_cnt = i = 0; i < rename_dst_nr; i++) {
-		int base = dst_cnt * num_src;
 		struct diff_filespec *two = rename_dst[i].two;
+		struct diff_score *m;
+
 		if (rename_dst[i].pair)
 			continue; /* dealt with exact match already. */
+
+		m = &mx[dst_cnt];
+		m->dst = -1;
 		for (j = 0; j < rename_src_nr; j++) {
 			struct diff_filespec *one = rename_src[j].one;
-			struct diff_score *m = &mx[base+j];
-			m->src = j;
-			m->dst = i;
-			m->score = estimate_similarity(one, two,
-						       minimum_score);
-			m->name_score = basename_same(one, two);
+			int score, name_score;
+			score = estimate_similarity(one, two,
+						    minimum_score);
+			name_score = basename_same(one, two);
+			if (m->dst < 0 ||
+			    score > m->score ||
+			    (score == m->score &&
+			     name_score > m->name_score)) {
+				m->score = score;
+				m->name_score = name_score;
+				m->src = j;
+				m->dst = i;
+			}
 			diff_free_filespec_blob(one);
 		}
 		/* We do not need the text anymore */
 		diff_free_filespec_blob(two);
 		dst_cnt++;
 	}
+
 	/* cost matrix sorted by most to least similar pair */
-	qsort(mx, num_create * num_src, sizeof(*mx), score_compare);
-	for (i = 0; i < num_create * num_src; i++) {
-		struct diff_rename_dst *dst = &rename_dst[mx[i].dst];
-		struct diff_filespec *src;
-		if (dst->pair)
-			continue; /* already done, either exact or fuzzy. */
-		if (mx[i].score < minimum_score)
-			break; /* there is no more usable pair. */
-		src = rename_src[mx[i].src].one;
-		if (src->rename_used)
-			continue;
-		record_rename_pair(mx[i].dst, mx[i].src, mx[i].score);
-		rename_count++;
-	}
-	for (i = 0; i < num_create * num_src; i++) {
+	qsort(mx, num_create, sizeof(*mx), score_compare);
+	for (i = 0; i < num_create; i++) {
 		struct diff_rename_dst *dst = &rename_dst[mx[i].dst];
 		if (dst->pair)
 			continue; /* already done, either exact or fuzzy. */
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux