Re: [RFC PATCHv3 00/10] gitweb: Simple file based output caching

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 18 Feb 2010, J.H. wrote:

> > Shortlog:
> > ~~~~~~~~~
> > Jakub Narebski (10):

[removed all "Looks fine to me.]

> >   gitweb/cache.pm - Very simple file based cache
>
> This looks fine to me, and I'm much more keen on using the CHI interface.

First, Cache::Cache is older and more mature than CHI, so it is more
likely for it to be installed (especially that nowadays CHI requires
Moose, modern OOP for Perl).

Second, I think that all caching engines (like Cache::Memcached and
Cache::FastMmap, which are not in Cache::Cache) provide ->get / ->set()
methods.  Not all provide ->compute() method, and those that provide
its equivalent might do it differently (Cache::Memcached uses callbacks),
or name method differently (Cache::FastMmap uses ->get_and_set() instead
of ->compute()).

> >   gitweb/cache.pm - Stat-based cache expiration
> 
> - I think having expires time at -1 (never) is a dangerous default.  I
> think having it as an option is fine, but setting this to something like
> 6 months may be a better option.  I wouldn't expect this to be an issue
> in reality, but I can just see someone screwing up and having a cache
> system that *never* expires.  I think I'd rather see something long vs.
> never.  But that's just my opinion.

You might be right here.  If it was generic cache engine, then default
to never expire is the only possible solution, as you don't know how
cache would be used, and what would be lifetimes involved.  But for
gitweb we can (try to) guess good default expire time.

Please do remember however that there are *two* sets of defaults.
First is hardcoded in ->new() constructor for GitwebCache::SimpleFileCache,
second is in %cache_options in gitweb, and this is the one that can be
changed in $GITWEB_CONFIG.  It is 'expires_on' value in %cache_options
that takes precedence (introduced in next commit), and it is 20 seconds.
 
> - This does have the problem (though this gets fixed later on, again)
> that it would return invalid if there's a process already updating the
> cache.  This might need to be fixed later to understand the locking
> structures on what is/isn't valid (this is mostly a note for me while I
> read through the patches)

Yes, it is possible that more than one process would update cache.  
Using locking is one of possible ways of solving or reducing this
issue, other is 'expires_variance' technique from CHI,... yet another
could be "touching" (setting mtime) of a cache file before regenerating
(refreshing it).

But GitwebCache::SimpleFileCache (err... the name could be better)
would always return correct (fully generated) cache entry, and would
not return partially filled data, even if there are more than one process
generating data in parallel (simultaneously), thanks to "atomic write"
technique (write to temporary file, then rename said file to destination).
File::Temp (or UUID + sequence number) would ensure that processes don't
stomp on each other data.
 
> >   gitweb: Use Cache::Cache compatibile (get, set) output caching
> 
> - It might be worth (in a later patch) enabling the PerlIO layers as
> there was a significant speed improvement in using them, despite their
> non-standardness.

As I wrote and said on #git, I think I was benchmarking incorrect thing;
what is important is the difference between writing large amount of data
(like e.g. snapshot) with different output capturing solutions; the
difference between PerlIO and other solutions was the difference between
12 μs (microseconds) and 18 μs _when profiling_, or something like that.


NOTE: output caching for gitweb consists of three separate issues:
* caching engine (GitwebCache::SimpleFileCache, or other caching engine
  supporting ->get/->set or ->compute (see also below))
* capturing output (current 'select(FILEHANDLE)' solution, possible
  tie filehandle and PerlIO layers solutions, or "tee" solutions...
  but that could be added later)
* output caching driver (a la CGI::Cache, or Plack::Middleware::Cache,
  which gets data from cache and prints it, or captures and prints
  or "tee"s output and saves it to cache).

We can provide other options for capturing output, but this should be
IMVHO left for later commits, isn't it?

> 
> - What if you want to use a different caching library but don't want
> cache.pm ?  That first bit of "if ($caching_enabled) {" might need to be
> wrapped to figure out if we are using our inbuilt caching or an external
> caching system. (just thinking out loud)

It is described in gitweb/README... although I haven't tested it myself
(probably should have at least for one other engine, like 
Cache::FileCache).

In short you can set $cache variable to instance (initialized object)
of selected caching engine, perhaps using %cache_options in its 
constructor, or set $cache to string containing name of cache class,
and add extra options to constructor in %cache_options.
 
> Looks fine to me.
> 
> >   gitweb/cache.pm - Adaptive cache expiration time
> 
> is the commented:
> 
> #return &{$self->{'check_load'}}();
> 
> intended in check_load() ?
> 
> otherwise this looks fine to me.

Leftover from thinking about different ways of writing it.  You can
write either &{$self->{'check_load'}}(), or you can write
$self->{'check_load'}->() - it is a matter of style.

Sidenote: we should probably standarise whether caller or the method
itself is responsible for checking if field is set, in the like methods
(there would be one more like that: the 'generating_info' callback).

> 
> >   gitweb: Use CHI compatibile (compute method) caching
> 
> Looks fine to me.  I think it's fine where it's at in the order myself.

O.K.

> >   gitweb: Show appropriate "Generating..." page when regenerating cache
> 
> I've got a couple of things that will need to be added on top of this
> (mainly to try and guess if the client is dumb or smart) so that if it
> won't support "Generating..." as expected it doesn't get it.  But that can
> come in a later patch.

See subthread http://thread.gmane.org/gmane.comp.version-control.git/136913/focus=137805
in first version of [split] output caching for gitweb series.

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]