On Thu, Dec 11, 2008 at 01:33:29AM +0100, Jakub Narebski wrote: > Luben Tuikov changed 'lineno' link from leading to commit which gave > current version of given block of lines, to leading to parent of this > commit in 244a70e (Blame "linenr" link jumps to previous state at > "orig_lineno"). This made possible data mining using 'blame' view. > > The current implementation calls rev-parse once per each blamed line > to find parent revision of blamed commit, even when the same commit > appears more than once, which is inefficient. > > This patch attempts to mitigate this issue by storing (caching) > $parent_commit info in %metainfo, which makes gitweb call > git-rev-parse only once per each unique commit in blame output. > > > In the tables below you can see simple benchmark comparing gitweb > performance before and after this patch > > File | L[1] | C[2] || Time0[3] | Before[4] | After[4] > ==================================================================== > blob.h | 18 | 4 || 0m1.727s | 0m2.545s | 0m2.474s > GIT-VERSION-GEN | 42 | 13 || 0m2.165s | 0m2.448s | 0m2.071s > README | 46 | 6 || 0m1.593s | 0m2.727s | 0m2.242s > revision.c | 1923 | 121 || 0m2.357s | 0m30.365s | 0m7.028s > gitweb/gitweb.perl | 6291 | 428 || 0m8.080s | 1m37.244s | 0m20.627s > > File | L/C | Before/After > ========================================= > blob.h | 4.5 | 1.03 > GIT-VERSION-GEN | 3.2 | 1.18 > README | 7.7 | 1.22 > revision.c | 15.9 | 4.32 > gitweb/gitweb.perl | 14.7 | 4.71 > > As you can see the greater ratio of lines in file to unique commits > in blame output, the greater gain from the new implementation. > > Footnotes: > ~~~~~~~~~~ > [1] Lines: > $ wc -l <file> > [2] Individual commits in blame output: > $ git blame -p <file> | grep author-time | wc -l > [3] Time for running "git blame -p" (user time, single run): > $ time git blame -p <file> >/dev/null > [4] Time to run gitweb as Perl script from command line: > $ gitweb-run.sh "p=.git;a=blame;f=<file>" > /dev/null 2>&1 > > The gitweb-run.sh script includes slightly modified (with adjusted > pathnames) code from gitweb_run() function from the test script > t/t9500-gitweb-standalone-no-errors.sh; gitweb config file > gitweb_config.perl contents (again up to adjusting pathnames; in > particular $projectroot variable should point to top directory of git > repository) can be found in the same place. > > > Alternate solutions: > ~~~~~~~~~~~~~~~~~~~~ > Alternate solution would be to open bidi pipe to "git cat-file > --batch-check", (like in Git::Repo in gitweb caching by Lea Wiemann), > feed $long_rev^ to it, and parse its output which has the following > form: > > 926b07e694599d86cec668475071b32147c95034 commit 637 > > This would mean one call to git-cat-file for the whole 'blame' view, > instead of one call to git-rev-parse per each unique commit in blame > output. > > > Yet another solution would be to change use of validate_refname() to > validate_revision() when checking script parameters (CGI query or > path_info), with validate_revision being something like the following: > > sub validate_revision { > my $rev = shift; > return validate_refname(strip_rev_suffixes($rev)); > } > > so we don't need to calculate $long_rev^, but can pass "$long_rev^" as > 'hb' parameter. > > This solution has the advantage that it can be easily adapted to > future incremental blame output. > > Acked-by: Luben Tuikov <ltuikov@xxxxxxxxx> > Signed-off-by: Jakub Narebski <jnareb@xxxxxxxxx> Acked-by: Petr Baudis <pasky@xxxxxxx> (though I think the commit message is total overkill for such an obvious change ;-) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html