Re: [RFC PATCHv2 04/10] gitweb: Use Cache::Cache compatibile (get, set) output caching

Petr Baudis <pasky@xxxxxxx> · Wed, 10 Feb 2010 13:02:57 +0100

On Wed, Feb 10, 2010 at 12:28:14PM +0100, Jakub Narebski wrote:
> On Wed, 10 Feb 2010, Petr Baudis wrote:
> > On Wed, Feb 10, 2010 at 02:12:24AM +0100, Jakub Narebski wrote:
> > > On Tue, 9 Feb 2010 at 11:30 +0100, Jakub Narebski wrote:
> > > 
> > > > The cache_fetch subroutine captures output (from STDOUT only, as
> > > > STDERR is usually logged) using either ->push_layer()/->pop_layer()
> > > > from PerlIO::Util submodule (if it is available), or by setting and
> > > > restoring *STDOUT.  Note that only the former could be tested reliably
> > > > to be reliable in t9503 test!
> > > 
> > > Scratch that, I have just checked that (at least for Apache + mod_cgi,
> > > but I don't think that it matters) the latter solution, with setting
> > > and restoring *STDOUT doesn't work: I would get data in cache (so it
> > > can be restored later), but instead of output I would get Internal Server
> > > Error ("The server encountered an internal error or misconfiguration and
> > > was unable to complete your request.") without even a hint what the
> > > problem was.  Sprinkling "die ...: $!" didn't help to catch this error:
> > > I suspect that the problem is with capturing.
> > > 
> > > So we either would have to live with non-core PerlIO::Util or (pure Perl)
> > > Capture::Tiny, or do the 'print -> print $out' patch...
> > 
> > All the magic methods seem to be troublesome, but in that case I'd
> > really prefer a level of indirection instead of filehandle - as is,
> > 'print (...) -> output (...)' ins. of 'print (...) -> print $out (...)'
> > (or whatever). That should be really flexible and completely
> > futureproof, and I don't think the level of indirection would incur any
> > measurable overhead, would it?
> 
> First, it is not only 'print (...) -> print $out (...)'; you need to
> do all those:
> 
>    print  <sth>            ->  print  $out <sth>
>    printf <sth>            ->  printf $out <sth>
>    binmode STDOUT, <mode>  ->  binmode $out, <mode>
> 
> Second, using "tie" on filehandle (on *STDOUT) can be used also for 
> just capturing output, not only for "tee"-ing; what's more to print
> while capturing one has to do extra work.  It is quite similar to
> replacing 'print (...)' with 'output (...)' etc., but using
> tie/untie doesn't require large patch to gitweb.
> 
> Third, as you can see below tie-ing is about 1% slower than using
> 'output (...)', which in turn is less than 10% slower than explicit
> filehandle solution i.e. 'print $out (...)'... and is almost twice
> slower than solution using PerlIO::Util
> 
> Benchmark: timing 50000 iterations of output, perlio, print \$out, tie *STDOUT...
>      output: 1.81462 wallclock secs ( 1.77 usr +  0.00 sys =  1.77 CPU) @ 28248.59/s (n=50000)
>      perlio: 1.05585 wallclock secs ( 1.03 usr +  0.00 sys =  1.03 CPU) @ 48543.69/s (n=50000)
> print \$out: 1.70027 wallclock secs ( 1.66 usr +  0.00 sys =  1.66 CPU) @ 30120.48/s (n=50000)
> tie *STDOUT: 1.82248 wallclock secs ( 1.79 usr +  0.00 sys =  1.79 CPU) @ 27932.96/s (n=50000)
>                Rate tie *STDOUT      output print \$out      perlio
> tie *STDOUT 27933/s          --         -1%         -7%        -42%
> output      28249/s          1%          --         -6%        -42%
> print \$out 30120/s          8%          7%          --        -38%
> perlio      48544/s         74%         72%         61%          --
> 
> Benchmark: running output, perlio, print \$out, tie *STDOUT for at least 10 CPU seconds...
>      output: 10.7199 wallclock secs (10.53 usr +  0.00 sys = 10.53 CPU) @ 28029.63/s (n=295152)
>      perlio: 11.2884 wallclock secs (10.46 usr +  0.00 sys = 10.46 CPU) @ 49967.11/s (n=522656)
> print \$out: 10.5978 wallclock secs (10.43 usr +  0.00 sys = 10.43 CPU) @ 30318.79/s (n=316225)
> tie *STDOUT: 11.3525 wallclock secs (10.68 usr +  0.00 sys = 10.68 CPU) @ 27635.96/s (n=295152)
>                Rate tie *STDOUT      output print \$out      perlio
> tie *STDOUT 27636/s          --         -1%         -9%        -45%
> output      28030/s          1%          --         -8%        -44%
> print \$out 30319/s         10%          8%          --        -39%
> perlio      49967/s         81%         78%         65%          --
> need
> 
> Attached there is script that was used to produce those results.

Ok, on my machine it's similar:

                Rate      output tie *STDOUT print \$out
output      150962/s          --         -1%         -7%
tie *STDOUT 152769/s          1%          --         -6%
print \$out 162604/s          8%          6%          --

is roughly consistent image coming out of it.

I guess the time spent here is generally negligible in gitweb anyway...
I suggested using output() because I think hacking it would be _very_
_slightly_ easier than tied filehandle, but you are right that doing
that is also really easy; having the possibility to use PerlIO::Util if
available would be non-essentially nice, but requiring it by stock
gitweb is not reasonable, especially seeing that it's not packaged even
for Debian. ;-)

-- 
				Petr "Pasky" Baudis
If you can't see the value in jet powered ants you should turn in
your nerd card. -- Dunbal (464142)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html