I have joined there two separate threads... probably attaching them in a wrong place. On Tue, 3 June 2008, Lea Wiemann wrote: > Rafael Garcia-Suarez wrote: > > > > Finally, to avoid forking git-rev-parse too many times, cache its > > results in a new hash %parent_commits. > > I'm not too happy with this: > > 1) Minor point: I'm working on caching for the backend right now > (IOW, basically what you're doing, just centralized in a separate > module), so you're essentially duplicating work, and you're making it > (a little) harder for me to refactor gitweb since I have to rip out > your cache code. I don't think %parent_commits hash is suitable for caching; it is only intermediate step, reducing number of git command calls (and forks) from number of blocks of changes in a blame, to number of distinct commits blamed. From this you would put info into appropriate Perl structure. ATTENTION! This example shows where caching [parsed] data have problems compared to front-end caching (caching output). Caching data is (usually) the best solution for pages which are generated from some parsed data _as a whole_, or can be generated from parsed data as a whole, i.e. heads, tags, summary, projects list, shortlog, history, view etc. Problems occur when we try to cache page with _streaming_ output, such as blob view, blame view, diff part of commitdiff etc. Here better solution might be either front-end cache (caching HTML output), or back-end caching (caching output of git commands). > Those few lines won't hurt, but in general I suggest that nobody > make any larger efforts to cache stuff in gitweb for the next few > weeks. Understandable, we want to avoid conflicts. By the way, if we agree that version %parent_commits is too intrusive dusring GSoC 2008, I think it would be good to accept into maint the patch with --revs-only, which fixes real bug, even if it is annoyance level only... > 2) Major point: You're still forking a lot. The Right Thing is to > condense everything into a single call -- I believe "git-rev-list > --parents --no-walk hash hash hash..." is correct and easily > parsable. Its output seems to be lines of > hash parent_1 parent_2 ... parent_n > with n >= 0. Can you implement that? It would be very useful and > also reusable for me! This is not a good solution for 'blame' view, which is generated "on the fly", by streaming git-blame output via filter. Above solution goes counter to code flow flow: gitweb would have to somehow get list of all blamed commits. (See also note above about caching "stream-generated" pages). Modifying git-blame --porcelain (and --incremental) output has the advantage of simple code on gitweb side, retaining "streamed" page generation. It would be one fork less, but I guess that is negligible. It would be useful for other blame viewers such as "git gui blame" to do similar data mining fast. Other consumers of git-blame output should be (if written correctly) not affected by additional header which they don't understand. The disadvantage would be for gitweb to require version of git binary which has this feature... Of course implementing get_parents($hash[, $hash...]) in either gitweb, or Git.pm, using "git-rev-list --parents --no-walk <args>" could still be useful. ---------------------------------------------------------------------- On Tue, 3 June 2008, Lea Wiemann wrote: > Jakub Narebski wrote: > > I was thinking about extending git-blame porcelain format (and also > > incremental format, of course) by 'parents' (and perhaps > > 'original-parents') header... > > Regarding prettiness, I don't find parents in the porcelain output > particularly useful, but if other people think they need this, I won't > object. :) The change in gitweb was introduced by commit 244a70e (Blame "linenr" link jumps to previous state at "orig_lineno"), by Luben Tuikov; please read commit message for explanation how it could be used for data mining / browsing annotated history of a file. As I wrote above, this solution allows to have very simple, streaming code in gitweb dealing with "line before change" links. The porcelain (and incremental) format of git-blame was created in such way to allow easy extending it; true and rewritten parents would help in navigating annotated file view in history. It should not, I think, affect your efforts; it is not you that proposed to write such extension. > Regarding performance, it would be good to show that the solution I'm > suggesting in my separate is slower than extending git-blame before > implementing anything. (I doubt it matters performance-wise.) It is one fork more. And as I wrote above, you need list of all blamed commits upfront, which goes counter to currently used "streaming output" code flow; it would complicate code, and thus reduce performance. -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html