Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > On Mon, 28 Jan 2008, Jeff King wrote: >> >> I tried to reproduce this, but my peak heap allocation was only around >> 20MB. Is your repository fully packed? Not packed at all? Can you use >> valgrind/massif to figure out where the memory is going? > > I definitely can reproduce it, it's horrid. > > This is from "top" fairly late in the game, but with the thing not even > done yet. Current git, pretty much fully (and fairly aggressively) packed > current kernel repo, and using "diff.renamelmit=0". > > 4751 torvalds 20 0 852m 446m 47m R 72 22.4 2:46.58 git-merge-recur > > It finally finished with time reporting: > > 208.15user 3.50system 4:01.50elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k > 238736inputs+4544outputs (8261major+280971minor)pagefaults 0swaps > > where those 280971 minor page faults are what largely indicates how much > memory it used (the technical term for that number is "metric buttload of > memory"). With a bit of tweak, now I am getting these numbers to the rename detection that used to spend 800MB (the peak I observed was somewhere around 430MB). (after patch, in the kernel repository, master at 96b5a46) $ /usr/bin/time git-diff -M -l0 --name-status d19fbe8a7 master >/var/tmp/3 157.20user 1.03system 2:38.72elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (32major+237315minor)pagefaults 0swaps $ /usr/bin/time git-diff -M -l0 --name-status d19fbe8a7 master >/var/tmp/4 174.00user 2.73system 3:09.55elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (6106major+459314minor)pagefaults 0swaps So it is not that much of an improvement, but it seems to help somewhat. The first hunk is about shrinking the diff_score structure; before the patch, it was O(NxM) where N and M are number of rename source and destination candidates, but after the patch it is now O(M), so this shrinkage should not matter, but score is capped to MAX_SCORE (60000) and name_score is actually 0 or 1. We cannot make it 1-bit unsigned bitfield as there is a qsort comparison callback that does (b->name_score - a->name_score). We would need to see where the remaining 400MB is going and try to shrink it, but this would be an improvement so I'll soon be moving this to 'next'. --- diffcore-rename.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 90d06f0..99953e7 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -112,8 +112,8 @@ static int basename_same(struct diff_filespec *src, struct diff_filespec *dst) struct diff_score { int src; /* index in rename_src */ int dst; /* index in rename_dst */ - int score; - int name_score; + unsigned short score; + short name_score; }; static int estimate_similarity(struct diff_filespec *src, @@ -393,7 +393,7 @@ static int find_exact_renames(void) return i; } -#define NUM_CANDIDATE_PER_DST 8 +#define NUM_CANDIDATE_PER_DST 4 static void record_if_better(struct diff_score m[], struct diff_score *o) { int i, worst; - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html