Junio C Hamano <gitster@xxxxxxxxx> writes: > Jeff King <peff@xxxxxxxx> writes: > >> On Wed, Jan 30, 2008 at 08:51:09AM +1100, Linus Torvalds wrote: >> >>> I definitely can reproduce it, it's horrid. >>> >>> This is from "top" fairly late in the game, but with the thing not even >>> done yet. Current git, pretty much fully (and fairly aggressively) packed >>> current kernel repo, and using "diff.renamelmit=0". >> >> Hrm, setting diff.renamelimit to 0 lets me reproduce (I thought I tried >> it before, but clearly not...). How about this fairly obvious patch? We used to record, for N=src deleted paths and M=dst created paths, (N x M) "struct diff_score" that records how similar each of the pair is, and picked the <src,dst> pairs that gives the best match first, and then went on to worse matches. This sorting is so that two destinations are both found to be similar to a single source, we can process the more similar one first, and when processing the second one, it can notice "Ah, the source I was planning to say I am a copy of is already taken by somebody else" and continue on to match himself with another source with a lessor match (this matters to a change introduced between 1.5.3.X series and 1.5.4-rc, that lets the code to favor unused matches first and then falls back to using already used matches). This instead allocates and keeps only M records in core. For each dst, we compute similarlity with all sources (so the number of similarity estimate computations we do is still N x M), but we keep the best src for each dst. This is essentially to save memory drastically by giving up to come up with better pairing. I guess we could keep a handful best candidates per dst, instead of just one, to further improve on this approach, and such a change should be fairly straightforward. --- diffcore-rename.c | 46 +++++++++++++++++++++++----------------------- 1 files changed, 23 insertions(+), 23 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 3d37725..7e8fdcd 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -473,42 +473,42 @@ void diffcore_rename(struct diff_options *options) if (num_create * num_src > rename_limit * rename_limit) goto cleanup; - mx = xmalloc(sizeof(*mx) * num_create * num_src); + + mx = xmalloc(sizeof(*mx) * num_create); for (dst_cnt = i = 0; i < rename_dst_nr; i++) { - int base = dst_cnt * num_src; struct diff_filespec *two = rename_dst[i].two; + struct diff_score *m; + if (rename_dst[i].pair) continue; /* dealt with exact match already. */ + + m = &mx[dst_cnt]; + m->dst = -1; for (j = 0; j < rename_src_nr; j++) { struct diff_filespec *one = rename_src[j].one; - struct diff_score *m = &mx[base+j]; - m->src = j; - m->dst = i; - m->score = estimate_similarity(one, two, - minimum_score); - m->name_score = basename_same(one, two); + int score, name_score; + score = estimate_similarity(one, two, + minimum_score); + name_score = basename_same(one, two); + if (m->dst < 0 || + score > m->score || + (score == m->score && + name_score > m->name_score)) { + m->score = score; + m->name_score = name_score; + m->src = j; + m->dst = i; + } diff_free_filespec_blob(one); } /* We do not need the text anymore */ diff_free_filespec_blob(two); dst_cnt++; } + /* cost matrix sorted by most to least similar pair */ - qsort(mx, num_create * num_src, sizeof(*mx), score_compare); - for (i = 0; i < num_create * num_src; i++) { - struct diff_rename_dst *dst = &rename_dst[mx[i].dst]; - struct diff_filespec *src; - if (dst->pair) - continue; /* already done, either exact or fuzzy. */ - if (mx[i].score < minimum_score) - break; /* there is no more usable pair. */ - src = rename_src[mx[i].src].one; - if (src->rename_used) - continue; - record_rename_pair(mx[i].dst, mx[i].src, mx[i].score); - rename_count++; - } - for (i = 0; i < num_create * num_src; i++) { + qsort(mx, num_create, sizeof(*mx), score_compare); + for (i = 0; i < num_create; i++) { struct diff_rename_dst *dst = &rename_dst[mx[i].dst]; if (dst->pair) continue; /* already done, either exact or fuzzy. */ - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html