Re: [RFC PATCHv2 04/10] gitweb: Use Cache::Cache compatibile (get, set) output caching

Jakub Narebski <jnareb@xxxxxxxxx> · Wed, 10 Feb 2010 19:22:48 +0100

Dnia środa 10. lutego 2010 13:02, Petr Baudis napisał:
> On Wed, Feb 10, 2010 at 12:28:14PM +0100, Jakub Narebski wrote:
>> On Wed, 10 Feb 2010, Petr Baudis wrote:
>>> On Wed, Feb 10, 2010 at 02:12:24AM +0100, Jakub Narebski wrote:

[...]
>>>> So we either would have to live with non-core PerlIO::Util or (pure Perl)
>>>> Capture::Tiny, or do the 'print -> print $out' patch...
>>> 
>>> All the magic methods seem to be troublesome, but in that case I'd
>>> really prefer a level of indirection instead of filehandle - as is,
>>> 'print (...) -> output (...)' ins. of 'print (...) -> print $out (...)'
>>> (or whatever). That should be really flexible and completely
>>> futureproof, and I don't think the level of indirection would incur any
>>> measurable overhead, would it?
>> 
>> First, it is not only 'print (...) -> print $out (...)'; [...]
>> 
>> Second, using "tie" on filehandle (on *STDOUT) can be used also for 
>> just capturing output, not only for "tee"-ing; [...]
>> 
>> Third, as you can see below tie-ing is about 1% slower than using
>> 'output (...)', which in turn is less than 10% slower than explicit
>> filehandle solution i.e. 'print $out (...)'... and is almost twice
>> slower than solution using PerlIO::Util
[...]
>>                Rate tie *STDOUT      output print \$out      perlio
>> tie *STDOUT 27636/s          --         -1%         -9%        -45%
>> output      28030/s          1%          --         -8%        -44%
>> print \$out 30319/s         10%          8%          --        -39%
>> perlio      49967/s         81%         78%         65%          --
>> need

> Ok, on my machine it's similar:
> 
>                 Rate      output tie *STDOUT print \$out
> output      150962/s          --         -1%         -7%
> tie *STDOUT 152769/s          1%          --         -6%
> print \$out 162604/s          8%          6%          --

I wonder why in my case the 'output (...)' solution was faster than tie,
and you have tie faster than 'output'... but I guess 1% is the noise
level.

> is roughly consistent image coming out of it.
> 
> I guess the time spent here is generally negligible in gitweb anyway...
> I suggested using output() because I think hacking it would be _very_
> _slightly_ easier than tied filehandle, but you are right that doing
> that is also really easy; having the possibility to use PerlIO::Util if
> available would be non-essentially nice, but requiring it by stock
> gitweb is not reasonable, especially seeing that it's not packaged even
> for Debian. ;-)

Well, the idea was to use PerlIO::Util if possible, checking it via

  out $use_perlio_layers = eval { require PerlIO::Util; 1 };

and fallback to generic mechanism if it is not present.  Only the
generic mechanism would have to be changed from manipulating *STDOUT
(*STDOUT = $data_fh etc.) to tied filehandle.

What we need to be careful about is ':utf8' vs ':raw' mode (IO layer).
In the PerlIO layers solution, and in 'print <sth> -> print $out <sth>'
solution where $out = $data_fh, and $data_fh was opened to in-memory
file, the data saved in variable is already converted, already passed
via 'utf8' layer, and is saved as bytes.  And if we use binary mode,
it is passed unchanged, and is also saved as bytes.  Therefore we can
save to cache file in ':raw' more, and read from cache file in ':raw'
mode, and that is why we don't need separate files for text and for
binary output.

PRINT method in class tied to filehandle gets _untransformed_ argument,
so we have to use utf8::encode($str) if in ':utf8' mode, and either
use PerlIO::get_layers on *STDOUT, or provide BINMODE method in tied
class to watch for mode changes.  (Note that e.g. for snapshots we 
print HTTP headers in ':utf8' mode, and the snapshot itself in ':raw'
i.e. binary mode.).

But all that is doable, and not that much work.  Well, perhaps more
than in the case of 'print -> print $out' etc., and opening in-memory
file via

  open $data_fh, '>', \$data;

but not that more, and we don't need extra global variable $out.  But
no large gitweb patch, and no worry about somebody accidentally using
'print <sth>;' or 'printf <fmt>, <sth>;' instead of respectively
'print $out <sth>;' and 'printf $out <fmt>, <sth>;'.

As to how I installed PerlIO::Util for myself (this might be interesting
to other people): in short, I use local::lib bootstrapping and cpan 
client.  I could from start install some Perl modules from CPAN locally
using 'cpan' client (included in perl RPM).  I have asked on #perl 
channel on FreeNode what to do, and they recommended local::lib.  After
following the bootstapping technique described in local::lib manpage
(see e.g. http://p3rl.org/local::lib) installing PerlIO::Util in
~/perl5 is as simple as 'cpan -i PerlIO::Util' (or using 'cpan' client
interactively).

You can always put
  use lib '/path/to/perl5/lib';
in your $GITWEB_CONFIG file.

Perhaps adding something like "use lib __DIR__.'/lib';" somewhere near
beginning of file (where __DIR__ is appropriate expression that expands
to directory the gitweb.cgi/gitweb.perl is in) to gitweb would be a good
idea?  Then you would be able to make __DIR__/lib symlink to local Perl
modules, or put extra modules by hand under __DIR__/lib.
-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html