Hi, I find that I have little clue about how to convert the following brief test script into some test to place in t/perf:
#!/bin/sh rm -rf /tmp/git-test mkdir /tmp/git-test cd /tmp/git-test git init LIMIT=200000 yes a|head -$LIMIT >data yes b|head -$LIMIT >data2 git add data data2 git commit -m "split" git rm data2 yes 'a b' | head -$(($LIMIT*2)) >data git add data git commit -m "combined" time git blame data >/dev/null
The variable LIMIT is the deciding factor for determining performance which, with the code in current master, is rather measurably O(LIMIT^2). I think that the current test takes about 15 minutes to complete on my computer. Obviously, that's sort of excessive: there is little point in choosing sizes that show off more than two orders of magnitude in improvement. Now the pathological cases are lots of small but attributable fragments in the blamed source files. One real-world project that is hit rather hard is an alphabetically sorted large list of words that tends to get insertions/deletions of few scattered lines at a time. Should one aim for an actually pathological case like in this script? Should one try benchmarking with one of the stock repositories instead that don't really demonstrate well just how bad the behavior might become and which code passages are dominant regarding the quadratic behavior? -- David Kastrup