On Sun, 7 Feb 2010, Jakub Narebski wrote: > There is new version of this series in gitweb/cache-kernel-v2 in my > git/jnareb-git.git fork (clone) of git.git repository at repo.or.cz. > Now all commits have proper description (for first series one had to > read comment section in emails for commit description), [...] Below there are commit messages for gitweb/cache-kernel-v2 branch after rebase and fixups: commit 560e2ab10d0f8457fbeca7a26814ff3e32396f7b Author: Jakub Narebski <jnareb@xxxxxxxxx> Date: Sun Feb 7 11:27:22 2010 +0100 gitweb: href(..., -path_info => 0|1) If named boolean option -path_info is passed to href() subroutine, use its value to decide whether to generate path_info URL form. If this option is not passed, href() queries 'pathinfo' feature to check whether to generate path_info URL (if generating path_info link is possible at all). href(-replay=>1, -path_info=>0) is meant to be used to generate a key for caching gitweb output; alternate solution would be to use freeze() from Storable (core module) on %input_params hash (or its reference), e.g.: $key = freeze \%input_params; or other serialization technique. While at it document extra options/flags to href(). Signed-off-by: Jakub Narebski <jnareb@xxxxxxxxx> gitweb/gitweb.perl | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) commit dd6e8dc27d5b799bd2a1aed03738195dfe3bc5e7 Author: Jakub Narebski <jnareb@xxxxxxxxx> Date: Sun Feb 7 13:13:06 2010 +0100 gitweb/cache.pm - Very simple file based caching This is first step towards implementing file based output (response) caching layer that is used on such large sites as kernel.org. This patch introduces GitwebCaching::SimpleFileCache package, which follows Cache::Cache / CHI interface, although do not implement it fully. The intent of following established convention is to be able in the future to replace our simple file based cache e.g. by one using memcached. Like in original patch by John 'Warthog9' Hawley (J.H.) (the one this commit intends to be incremental step to), the data is stored in the case as-is, without adding metadata (like expiration date), and without serialization (which means only scalar data). To be implemented (from original patch by J.H.): * cache expiration (based on file stats, current time and global expiration time); currently elements in cache do not expire * actually using this cache in gitweb, except error pages * adaptive cache expiration, based on average system load * optional locking interface, where only one process can update cache (using flock) * server-side progress indicator when waiting for filling cache, which in turn requires separating situations (like snapshots and other non-HTML responses) where we should not show 'please wait' message Possible extensions (beyond what was in original patch): * (optionally) show information about cache utilization * AJAX (JavaScript-based) progress indicator * JavaScript code to update relative dates in cached output * make cache size-aware (try to not exceed specified maximum size) * utilize X-Sendfile header (or equivalent) to show cached data (optional, as it makes sense only if web server supports sendfile feature and have it enabled) * variable expiration feature from CHI, allowing items to expire a bit earlier than the stated expiration time to prevent cache miss stampedes (although locking, if available, should take care of this). The code of GitwebCaching::SimpleFileCache package in gitweb/cache.pm was heavily based on file-based cache in Cache::Cache package, i.e. on Cache::FileCache, Cache::FileBackend and Cache::BaseCache, and on file-based cache in CHI, i.e. on CHI::Driver::File and CHI::Driver (including implementing atomic write, something that original patch lacks). This patch does not yet enable output caching in gitweb (it doesn't have all required features yet); on the other hand it includes tests, currently testing only cache Perl API. Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@xxxxxxxxxx> Signed-off-by: Jakub Narebski <jnareb@xxxxxxxxx> gitweb/cache.pm | 269 +++++++++++++++++++++++++++++++++++++++ t/t9503-gitweb-caching.sh | 32 +++++ t/t9503/test_cache_interface.pl | 84 ++++++++++++ t/test-lib.sh | 3 + 4 files changed, 388 insertions(+), 0 deletions(-) create mode 100644 gitweb/cache.pm create mode 100755 t/t9503-gitweb-caching.sh create mode 100755 t/t9503/test_cache_interface.pl commit 3914e7da792fec50fcc64c0e644d54cf4451703a Author: Jakub Narebski <jnareb@xxxxxxxxx> Date: Sun Feb 7 13:13:17 2010 +0100 gitweb/cache.pm - Stat-based cache expiration Add stat-based cache expiration to file-based GitwebCache::SimpleFileCache. Contrary to the way other caching interfaces such as Cache::Cache and CHI do it, the time cache element expires in is _global_ value associated with cache instance, and is not local property of cache entry. (Currently cache entry does not store any metadata associated with entry... which means that there is no need for serialization / marshalling / freezing and thawing.) Default expire time is -1, which means never expire. To check if cache entry is expired, GitwebCache::SimpleFileCache compares difference between mtime (last modify time) of a cache file and current time with (global) time to expire. It is done using CHI-compatible is_valid() method. Add some tests checking that expiring works correctly (on the level of API). To be implemented (from original patch by J.H.): * actually using this cache in gitweb, except error pages * adaptive cache expiration, based on average system load * optional locking interface, where only one process can update cache (using flock) * server-side progress indicator when waiting for filling cache, which in turn requires separating situations (like snapshots and other non-HTML responses) where we should not show 'please wait' message Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@xxxxxxxxxx> Signed-off-by: Jakub Narebski <jnareb@xxxxxxxxx> gitweb/cache.pm | 34 ++++++++++++++++++++++++++++++++-- t/t9503/test_cache_interface.pl | 10 ++++++++++ 2 files changed, 42 insertions(+), 2 deletions(-) commit a55625cb0f2d6c08a28e774fd2ddb4e5347a24b3 Author: Jakub Narebski <jnareb@xxxxxxxxx> Date: Sun Feb 7 13:13:27 2010 +0100 gitweb: Use Cache::Cache compatible (get, set) output caching This commit actually adds output caching to gitweb, as we have now minimal features required for it in GitwebCache::SimpleFileCache (a 'dumb' but fast file-based cache engine). To enable cache you need at least set $caching_enabled to true in gitweb config, and copy cache.pm from gitweb/ alongside gitweb.cgi - this is described in more detail in the new "Gitweb caching" section in gitweb/README Currently cache support related subroutines in cache.pm (which are outside GitwebCache::SimpleFileCache package) are not well separated from gitweb script itself; cache.pm lacks encapsulation. cache.pm assumes that there are href() subroutine and %actions variable, and that there exist $actions{$action} (where $action is parameter passed to cache_fetch), and it is a code reference (see also comments in t/t9503/test_cache_interface.pl). This is remaining artifact from the original patch by J.H. (which also had cache_fetch() subroutine). Gitweb itself uses directly only cache_fetch, to get page from cache or to generate page and save it to cache, and cache_stop, to be used in die_error subroutine, as currently error pages are not cached. The cache_fetch subroutine captures output (from STDOUT only, as STDERR is usually logged) using either ->push_layer()/->pop_layer() from PerlIO::Util submodule (if it is available), or by setting and restoring *STDOUT. Note that only the former could be tested reliably to be reliable in t9503 test! Enabling caching causes the following additional changes to gitweb output: * Disables content-type negotiation (choosing between 'text/html' mimetype and 'application/xhtml+xml') when caching, as there is no content-type negotiation done when retrieving page from cache. Use 'text/html' mimetype that can be used by all browsers. * Disable timing info (how much time it took to generate original page, and how many git commands it took), and in its place show when page was originally generated (in GMT / UTC timezone). Add basic tests of caching support to t9500-gitweb-standalone-no-errors test: set $caching_enabled to true and check for errors for first time run (generating cache) and second time run (retrieving from cache) for a single view - summary view for a project. If PerlIO::Util is available (see comments), test that cache_fetch behaves correctly, namely that it saves and restores action output in cache, and that it prints generated output or cached output. To be implemented (from original patch by J.H.): * adaptive cache expiration, based on average system load * optional locking interface, where only one process can update cache (using flock) * server-side progress indicator when waiting for filling cache, which in turn requires separating situations (like snapshots and other non-HTML responses) where we should not show 'please wait' message Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@xxxxxxxxxx> Signed-off-by: Jakub Narebski <jnareb@xxxxxxxxx> gitweb/README | 70 ++++++++++++++++++++++ gitweb/cache.pm | 78 ++++++++++++++++++++++++ gitweb/gitweb.perl | 102 ++++++++++++++++++++++++++++---- t/gitweb-lib.sh | 2 + t/t9500-gitweb-standalone-no-errors.sh | 19 ++++++ t/t9503/test_cache_interface.pl | 93 +++++++++++++++++++++++++++++ 6 files changed, 352 insertions(+), 12 deletions(-) commit 3e471ebd31e881ce1439f23075378c2ec6b95e4d Author: Jakub Narebski <jnareb@xxxxxxxxx> Date: Sun Feb 7 13:13:31 2010 +0100 gitweb/cache.pm - Adaptive cache expiration time Add to GitwebCache::SimpleFileCache support for adaptive lifetime (cache expiration) control. Cache lifetime can be increased or decreased by any factor, e.g. load average, through the definition of the 'check_load' callback. Note that using ->set_expires_in, or unsetting 'check_load' via ->set_check_load(undef) turns off adaptive caching. Make gitweb automatically adjust cache lifetime by load, using get_loadavg() function. Define and describe default parameters for dynamic (adaptive) cache expiration time control. There are some very basic tests of dynamic expiration time in t9503, namely checking if dynamic expire time is within given upper and lower bounds. To be implemented (from original patch by J.H.): * optional locking interface, where only one process can update cache (using flock) * server-side progress indicator when waiting for filling cache, which in turn requires separating situations (like snapshots and other non-HTML responses) where we should not show 'please wait' message Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@xxxxxxxxxx> Signed-off-by: Jakub Narebski <jnareb@xxxxxxxxx> gitweb/cache.pm | 55 +++++++++++++++++++++++++++++++++++--- gitweb/gitweb.perl | 27 +++++++++++++++++- t/t9503/test_cache_interface.pl | 22 +++++++++++++++ 3 files changed, 97 insertions(+), 7 deletions(-) commit 984390f99c33d82cd4ddbfa6e00c721d9e74cddb Author: Jakub Narebski <jnareb@xxxxxxxxx> Date: Sun Feb 7 13:13:52 2010 +0100 gitweb: Use CHI compatible (compute method) caching If $cache provides CHI compatible ->compute($key, $code) method, use it instead of Cache::Cache compatible ->get($key) and ->set($key, $data). While at it, refactor regenerating cache into cache_calculate subroutine. GitwebCache::SimpleFileCache provides 'compute' method, which currently simply use 'get' and 'set' methods in proscribed manner. Nevertheless 'compute' method can be more flexible in choosing when to refresh cache, and which process is to refresh/(re)generate cache entry. This method would use (advisory) locking to prevent 'cache miss stampede' (aka 'stampeding herd') problem in the next commit. Signed-off-by: Jakub Narebski <jnareb@xxxxxxxxx> gitweb/cache.pm | 39 ++++++++++++++++++++++++++++++++++++--- 1 files changed, 36 insertions(+), 3 deletions(-) commit 7d0109e4379f5187364edf7c25cdbc5247609f64 Author: Jakub Narebski <jnareb@xxxxxxxxx> Date: Sun Feb 7 13:18:14 2010 +0100 gitweb/cache.pm - Use locking to avoid 'cache miss stampede' problem In the ->compute($key, $code) method from GitwebCache::SimpleFileCache, use locking (via flock) to ensure that only one process would generate data to update/fill-in cache; the rest would wait for the cache to be (re)generated and would read data from cache. Currently this feature can not be disabled (via %cache_options). A test in t9503 shows that in the case where there are two clients trying to simultaneously access non-existent or stale cache entry, (and generating data takes (artifically) a bit of time), if they are using ->compute method the data is (re)generated once, as opposed to if those clients are just using ->get/->set methods. To be implemented (from original patch by J.H.): * background building, and showing stale cache * server-side progress indicator when waiting for filling cache, which in turn requires separating situations (like snapshots and other non-HTML responses) where we should not show 'please wait' message Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@xxxxxxxxxx> Signed-off-by: Jakub Narebski <jnareb@xxxxxxxxx> gitweb/cache.pm | 29 ++++++++++++++++- t/t9503/test_cache_interface.pl | 65 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 92 insertions(+), 2 deletions(-) commit e7985f69eb9000860b155939d5fd7040e30f682f Author: Jakub Narebski <jnareb@xxxxxxxxx> Date: Sun Feb 7 13:19:21 2010 +0100 gitweb/cache.pm - Serve stale data when waiting for filling cache When process fails to acquire exclusive (writers) lock, then instead of waiting for the other process to (re)generate and fill cache, serve stale (expired) data from cache. This is of course possible only if there is some stale data in cache for given key. This feature of GitwebCache::SimpleFileCache is used only for an ->update($key, $code) method. It is controlled by 'max_lifetime' cache parameter; you can set it to -1 to always serve stale data if it exists, and you can set it to 0 (or any value smaller than 'expires_min') to turn this feature off. This feature, as it is implemented currently, makes ->update() method a bit assymetric with respect to process that acquired writers lock and those processes that didn't, which can be seen in the new test in t9503. The process that is to regenerate (refresh) data in cache must wait for the data to be generated in full before showing anything to client, while the other processes show stale (expired) data immediately. In order to remove or reduce this assymetry gitweb would need to employ one of the two alternate solutions. Either data should be (re)generated in background, so that process that acquired writers lock would generate data in background while serving stale data, or alternatively the process that generates data should pass output to original STDOUT while capturing it ("tee" otput). When developing this feature, ->is_valid() method acquired additional extra optional parameter, where one cap pass expire time instead of using cache-wode global expire time. To be implemented (from original patch by J.H.): * background building, * server-side progress indicator when waiting for filling cache, which in turn requires separating situations (like snapshots and other non-HTML responses) where we should not show 'please wait' message Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@xxxxxxxxxx> Signed-off-by: Jakub Narebski <jnareb@xxxxxxxxx> gitweb/cache.pm | 23 ++++++++++---- gitweb/gitweb.perl | 8 +++++ t/t9503/test_cache_interface.pl | 63 +++++++++++++++++++++++++++++++++++++- 3 files changed, 86 insertions(+), 8 deletions(-) commit 19911970b8a811a6382e39a10b071bff1dd4bd70 Author: Jakub Narebski <jnareb@xxxxxxxxx> Date: Sun Feb 7 13:20:46 2010 +0100 gitweb/cache.pm - Regenerate (refresh) cache in background This commit removes assymetry in serving stale data (if it exists) when regenerating cache in GitwebCache::SimpleFileCache. The process that acquired exclusive (writers) lock, and is therefore selected to be the one that (re)generates data to fill the cache, can now generate data in background, while serving stale data. This feature can be enabled or disabled on demand via 'background_cache' cache parameter. It is turned on by default. To be implemented (from original patch by J.H.): * server-side progress indicator when waiting for filling cache, which in turn requires separating situations (like snapshots and other non-HTML responses) where we should not show 'please wait' message Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@xxxxxxxxxx> Signed-off-by: Jakub Narebski <jnareb@xxxxxxxxx> gitweb/cache.pm | 36 +++++++++++++++++++++++++++++------- gitweb/gitweb.perl | 9 +++++++++ t/t9503/test_cache_interface.pl | 14 ++++++++------ 3 files changed, 46 insertions(+), 13 deletions(-) commit ce97bb5bc1660f6d5c9b9be68c556ac94097978c Author: Jakub Narebski <jnareb@xxxxxxxxx> Date: Sun Feb 7 13:21:10 2010 +0100 gitweb: Show appropriate "Generating..." page when regenerating cache When there exist stale/expired (but not too stale) version of (re)generated page in cache, gitweb returns stale version (and updates cache in background, assuming 'background_cache' is set to true value). When there is no stale version suitable to serve the client, currently we have to wait for the data to be generated in full before showing it. Add to GitwebCache::SimpleFileCache, via 'generating_info' callback, the ability to show user some activity indicator / progress bar, to show that we are working on generating data. Gitweb itself uses "Generating..." page as activity indicator, which redirects (via <meta http-equiv="Refresh" ...>) to refreshed version of the page after the cache is filled (via trick of not closing page and therefore not closing connection till data is available in cache, checked by getting shared/readers lock on lockfile for cache entry). The git_generating_data_html() subroutine, which is used by gitweb to implement this feature, is highly configurable: you can choose initial delay, frequency of writing some data so that connection won't get closed, and maximum time to wait for data in "Generating..." page (see %generating_options hash). Currently git_generating_data_html() contains hardcoded "whitelist" of actions for which such HTML "Generating..." page makes sense. This implements final feature from the original gitweb output caching patch by J.H. Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@xxxxxxxxxx> Signed-off-by: Jakub Narebski <jnareb@xxxxxxxxx> gitweb/cache.pm | 23 +++++- gitweb/gitweb.perl | 154 ++++++++++++++++++++++++++++++++++++++- t/t9503/test_cache_interface.pl | 45 +++++++++++ 3 files changed, 216 insertions(+), 6 deletions(-) -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html