Re: [PATCH 2/3] gitweb: Cache $parent_commit info in git_blame()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 10 Dec 2008, Luben Tuikov wrote:
> --- On Wed, 12/10/08, Jakub Narebski <jnareb@xxxxxxxxx> wrote:

Side note: is it Yahoo web mail interface that removes attributions?

> > > Have you tested this patch that it gives the same commit chain
> > > as before it?
> > 
> > The only difference between precious version and this patch
> > is that now, if you calculate sha-1 of $long_rev^, it is stored in 
> > $metainfo{$long_rev}{'parent'} and not calculated second time.
> 
> Yes, I agree a patch to this effect would improve performance
> proportionally to the history of the lines of a file.

Or rather proportionally to the ratio of number of lines of a file
to number of unique commits (not groups of lines) which are blamed
for given contents of a file.

> So it's a good thing, as most commits change a contiguous block
> of size more than one line of a file.
> 
> "$parent_commit" depends on "$full_rev^" which depends on "$full_rev".
> So as soon as "$full_rev" != "$old_full_rev", you'd know that you need
> to update "$parent_commit".  "$old_full_rev" needs to exist outside
> the  scope of "while (1)".  I didn't see this in the code or in the
> patch. 

You don't need $old_full_rev. We have to save data about commit in
%metainfo hash because information about commit appears in 
"git blame --porcelain" output _once_ per commit. So we use the same
cache to store $full_rev^ in $meta{'parent'}, which means storing
it in $metainfo{$full_rev}{'parent'}.

Now if the commit we saved this info about appears again in git-blame
output, be it in group of lines for which the same commit is blamed,
or later in unrelated chunk, we use saved info.

Let me try to explain it using the following diagram:

  Commit N Line Original code      This patch
  ------------------------------------------------------
  3a4046 1 xxx  rev-parse 3a4046^  rev-parse 3a4046^
         2 xxx  rev-parse 3a4046^  $mi{3a4046}{'parent'}
         3 xxx  rev-parse 3a4046^  $mi{3a4046}{'parent'}
  f47c19 5 xxx  rev-parse f47c19^  rev-parse f47c19^
         6 xxx  rev-parse f47c19^  $mi{f47c19}{'parent'}
  3a4046 7 xxx  rev-parse 3a4046^  $mi{3a4046}{'parent'}  <--
         8 xxx  rev-parse 3a4046^  $mi{3a4046}{'parent'}

where "rev-parse 3a4046^" means call to git-rev-parse, and $mi{<rev>}
accessing $metainfo{$full_rev} (via $meta).
 
In place marked by arrow '<--' you don't need to calculate 3a4046^
again...

> > But I have checked that (at least for single example file)
> > the blame output is identical for before and after this patch.
> 
> I've always tested it on something like "gitweb.perl", etc.

I've checked it on blob.h.  Other good example is README (with boundary
commits) and GIT-VERSION-GEN (with different output between git-blame
--porcelain and --incremental), both of which take much less time than
gitweb/gitweb.perl (see benchmarks in other post).

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux