Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have joined there two separate threads... probably attaching them
in a wrong place.

On Tue, 3 June 2008, Lea Wiemann wrote:
> Rafael Garcia-Suarez wrote:
> >
> > Finally, to avoid forking git-rev-parse too many times, cache its
> > results in a new hash %parent_commits.
>
> I'm not too happy with this:
>
> 1) Minor point: I'm working on caching for the backend right now
> (IOW, basically what you're doing, just centralized in a separate
> module), so you're essentially duplicating work, and you're making it
> (a little) harder for me to refactor gitweb since I have to rip out
> your cache code.

I don't think %parent_commits hash is suitable for caching; it is only
intermediate step, reducing number of git command calls (and forks)
from number of blocks of changes in a blame, to number of distinct 
commits blamed.  From this you would put info into appropriate Perl 
structure.


ATTENTION! This example shows where caching [parsed] data have problems 
compared to front-end caching (caching output).  Caching data is 
(usually) the best solution for pages which are generated from some 
parsed data _as a whole_, or can be generated from parsed data as a 
whole, i.e. heads, tags, summary, projects list, shortlog, history, 
view etc.

Problems occur when we try to cache page with _streaming_ output, such 
as blob view, blame view, diff part of commitdiff etc.  Here better 
solution might be either front-end cache (caching HTML output), or 
back-end caching (caching output of git commands).

> Those few lines won't hurt, but in general I suggest that nobody
> make any larger efforts to cache stuff in gitweb for the next few
> weeks. 

Understandable, we want to avoid conflicts.

By the way, if we agree that version %parent_commits is too intrusive 
dusring GSoC 2008, I think it would be good to accept into maint the 
patch with --revs-only, which fixes real bug, even if it is annoyance 
level only...

> 2) Major point: You're still forking a lot.  The Right Thing is to
> condense everything into a single call -- I believe "git-rev-list
> --parents --no-walk hash hash hash..." is correct and easily
> parsable. Its output seems to be lines of
>      hash parent_1 parent_2 ... parent_n
> with n >= 0.  Can you implement that?  It would be very useful and
> also reusable for me!

This is not a good solution for 'blame' view, which is generated "on the 
fly", by streaming git-blame output via filter.  Above solution goes 
counter to code flow flow: gitweb would have to somehow get list of all 
blamed commits.  (See also note above about caching "stream-generated" 
pages).

Modifying git-blame --porcelain (and --incremental) output has the 
advantage of simple code on gitweb side, retaining "streamed" page 
generation.  It would be one fork less, but I guess that is negligible.   
It would be useful for other blame viewers such as "git gui blame" to 
do similar data mining fast.  Other consumers of git-blame output 
should be (if written correctly) not affected by additional header 
which they don't understand. 

The disadvantage would be for gitweb to require version of git binary 
which has this feature...


Of course implementing get_parents($hash[, $hash...]) in either gitweb, 
or Git.pm, using "git-rev-list --parents --no-walk <args>" could still 
be useful.


----------------------------------------------------------------------
On Tue, 3 June 2008, Lea Wiemann wrote:
> Jakub Narebski wrote:
> > I was thinking about extending git-blame porcelain format (and also
> > incremental format, of course) by 'parents' (and perhaps
> > 'original-parents') header...
> 
> Regarding prettiness, I don't find parents in the porcelain output 
> particularly useful, but if other people think they need this, I won't 
> object. :)

The change in gitweb was introduced by commit 244a70e (Blame "linenr" 
link jumps to previous state at "orig_lineno"), by Luben Tuikov; please 
read commit message for explanation how it could be used for data 
mining / browsing annotated history of a file.

As I wrote above, this solution allows to have very simple, streaming 
code in gitweb dealing with "line before change" links.

The porcelain (and incremental) format of git-blame was created in such 
way to allow easy extending it; true and rewritten parents would help 
in navigating annotated file view in history.

It should not, I think, affect your efforts; it is not you that proposed 
to write such extension.

> Regarding performance, it would be good to show that the solution I'm 
> suggesting in my separate is slower than extending git-blame before 
> implementing anything.  (I doubt it matters performance-wise.)

It is one fork more.  And as I wrote above, you need list of all blamed 
commits upfront, which goes counter to currently used "streaming 
output" code flow; it would complicate code, and thus reduce 
performance.

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux