Re: [PATCH 00/18] Gitweb caching v8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



John, could you please in the future Cc me?  I am interested in gitweb
output caching development.  Thanks in advance.

"John 'Warthog9' Hawley" <warthog9@xxxxxxxxxxxxxx> writes:

> Afternoon everyone,
> 
> (Afternoon is like morning, right?)
>  
> This is the latest incarnation of gitweb w/ caching.  Per the general
> consensus and requests from the recent GitTogether I'm re-submitting 
> my patches.
> 
> Bunch of re-works in the code, and several requested features.  Sadly the
> patch series has balloned as I've been adding things.  It was 3-4 patches,
> it's now 18.  This is based on top of Jakub's v7.2 patch series, but
> it should be more or less clean now.

Could you please rebase it on top of v7.2 version?  The v7.2 patch
series contained a few bugs that needs to be corrected.

> 
> As such there was a bunch of changes that I needed to do to Jakub's tree
> which are indicated in the series.  Why did I do them up as separate things?
> Mainly there's a bunch of history that's getting lost right now between
> going back and forth, and I wanted to have clear patches to discuss
> should further discussion be needed.

I guess that in the final submission (i.e. the one that is to be
merged in into git.git repository) those changes would be squashed in,
isn't it?

> 
> This still differs, by two patches, from whats in production on kernel.org.
> It's missing the index page git:// link, and kernel.org and kernel.org also
> has the forced version matching.  As a note I'll probably let this stew
> another day or so on kernel.org and then I'll push it into the Fedora update
> stream, as there's a couple of things in this patch series that would be 
> good for them to have.

There was some discussion about git:// link in the past; nevertheless
this issue is independent on gitweb caching and can (and should) be
sent as a aeparate patch.

IIRC we agreed that because of backward compatibility forced versions
match is quite useless (in general)...

> 
> There is one additional script I've written that the Fedora folks are using,
> and that might be useful to include, which is an 'offline' cache file generator.
> It basically wraps gitweb.cgi and at the end moves the cache file into the right
> place.  The Fedora folks were finding it took hours to generate their front
> page, and that doing a background generation almost never completed (due to 
> process death).  This was a simple way to handle that.  If people would like
> I can add it in as an additional patch.

Are you detaching the background process?

It would be nice to have it as separate patch.

> 
> v8:
> 	- Reverting several changes from Jakub's change set that make no sense
>                 - is_cacheable changed to always return true - nothing special about
>                   blame or blame_incremental as far as the caching engine is concerned

'blame_incremental' is just another version of 'blame' view.  I have
disabled it when caching is enabled in my rewrite (you instead disabled
caching for 'blame_incremental' in your v7 and mine v7.x) because I
couldn't get it to work together with caching.  Did you check that it
works?

Besides, withou "tee"-ing, i.e. printing output as it is captured,
cached 'blame_data' means that 'blame_incremental' is not incremental,
and therefore it vanishes its advantage over 'blame'.

In the case data is in cache, then 'blame_inremental' doesn't have
advantage over 'blame' either.

>                 - Reverted config file change "caching_enabled" back to "cache_enable" as this
>                   config file option is already in the wild in production code, as are all
>                   current gitweb-caching configuration variables.

[Explitive deleted.]  I dislike strongly this $cache_enable.  I think it
would be better for backward compatibility (should we keep backward
compatibility with out-of-tree patches?) to use the same mechanism as
provided in 

  [PATCHv6/RFC 22/24] gitweb: Support legacy options used by kernel.org caching engine
  http://thread.gmane.org/gmane.comp.version-control.git/163052/focus=163058
  http://repo.or.cz/w/git/jnareb-git.git/commitdiff/27ec67ad90ecd56ac3d05f6a9ea49b6faabf7d0a

in my rewrite.  Just set $caching_enabled to true if $cache_enable is
defined and true.

>                 - Reverted change to reset_output as
>                         open STDOUT, ">&", \*STDOUT_REAL;
>                   causes assertion failures:
>                   Assertion !((((s->var)->sv_flags & (0x00004000|0x00008000)) == 0x00008000) && (((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVGV || ((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVLV)) failed: file "scalar.xs", line 49 at gitweb.cgi line 1221.
>                   if we encounter an error *BEFORE* we've ever changed the output.

Which Perl version are you using?  Because I think you found error in Perl.
Well, at least I have not happen on this bug.

I have nothing againts using

  open STDOUT, ">&STDOUT_REAL";

though I really prefer that you used lexical filehandles, instead of
"globs" which are global variables.

The following works:

  open STDOUT, '>&', fileno($fh);

Note that fileno(Symbol::qualify_to_ref($fh)) might be needed...

>         - Cleanups there were indirectly mentioned by Jakub
>                 - Elimination of anything even remotely looking like duplicate code
>                         - Creation of isBinaryAction() and isFeedAction()

Could you please do not use mixedCase names?


First, that is what %actions_info from

  [PATCH 16/24] gitweb: Introduce %actions_info, gathering information about actions
  http://thread.gmane.org/gmane.comp.version-control.git/163052/focus=163038
  http://repo.or.cz/w/git/jnareb-git.git/commitdiff/305a10339b33d56b4a50708d71e8f42453c8cb1f

I have invented for.

Second, why 'isBinaryAction()'?  there isn't something inherently
different between binary (':raw') and text (':utf8') output, as I have
repeatedly said before.  See my rewrite: there is no special case for
binary output (or perhaps binary output as in the case of 'blob_plain'
action).

>         - Adding in blacklist of "dumb" clients for purposes of downloading content
>         - Added more explicit disablement of "Generating..." page

Good, I'll check this.

>         - Added better error handling
>                 - Creation of .err file in the cache directory
>                 - Trap STDERR output into $output_err as this was spewing data prior
>                   to any header information being sent

Why it is needed?  We capture output of "die" via CGI::Util::set_message,
and "warn" output is captured to web server logs... unless you explicitely
use "print STDERR <sth>" -- don't do that instead.

>         - Added hidden field in footer for url & hash of url, which is extremely useful
>           for debugging

Nice idea, I'll see it.  Can it be disabled (information leakage)?

-- 
Jakub Narebski
Poland
ShadeHawk on #git
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]