First, what is the status of gitweb output caching? I have send minimal fixup for latest John 'Warthog9' Hawley series as Gitweb caching v7.1[1]. John said that he can work on fixes on November 2 at earliest. [1] http://thread.gmane.org/gmane.comp.version-control.git/160147/focus=160473 During working on my rewrite of gitweb output caching, based on ideas (and some code) from John 'Warthog9' Hawley (J.H.) series I have met with the following issues: 1. Error handling There are three separate places where we can encounter error during caching. They are: A. setting up cache (initialization) B. retrieving data from cache C. storing (saving) data in cache. There are two possible choices for errors during cache initialization (case A): * turn off caching * show 500 Internal Server Error The problem with turning off caching is that it hides an error, and admin can not notice that caching does not work. Therefore I think that better solution would be to generate error, especially that such errors are not, usually, incidental. Note that in my opinion cache should not be initialized if caching is disabled. The modern CHI caching interface provides a way to change how runtime errors are handled via 'on_get_error' and 'on_set_error' parameters to constructor. The default for CHI is to log error, or ignore if no logger is set. You can provide a coderef (callback). For errors during getting data from cache (case B), we can ignore error which means generating data as if cache was disabled/entry did not exist in cache, or erroring out and showing error to user. Such errors should be rare, so erroring out might be appropriate reaction. Note that we have special case of zero sized cache entry file, in which case we treat it as non-existent entry, already. For errors during setting (saving) data to cache (case C), we can ignore error and not re-generate cache (not even calling callback if error is detected early), or error out. Such errors can happen for example if filesystem gets full. It might be better to ignore such errore, at least in the case of ENOSPC / ENOMEM. 2. Progress info and not caching error pages J.H. find out or invented very nice hack to generate progress info page without Ajax (and therefore without need for JavaScript), to make user aware that data is being regenerated, and that he/she should be patient instead of refreshing or complaining. This trick uses the fact that the internal redirect using meta refresh[2] <meta http-equiv="refresh" content="0" /> does not occur untill the web browser gets full page (which means that connection get closed / end of request). [2] http://en.wikipedia.org/wiki/Meta_refresh Besides the fact that, as J.H. wrote JH> There are still a few known "issues" with respect to this: JH> - Code needs to be added to be "browser" aware so JH> that clients like wget that are trying to get a JH> binary blob don't obtain a "Generating..." page there is additional problem, at least in my rewrite. (I didn't follow the code flow in J.H. v7 patch series to check if it also affected). Let's assume that 'progress_info' feature is turned on, and that cache is generated in background. Let's assume that error pages (die_error) are not cached. Now client tries to access page which would/will result in an error. With caching it acquires exclusive (writers) lock, check that there are no stale data (error pages are never cached), checks that it is provided with 'generating_info' subroutine, so it forks a process to generate data in background. Background process detaches, tries to generate data to cache, die_error is called. die_error turns off capturing, and prints to STDOUT. At the end of die_error there is jump to outer scope to DONE_GITWEB; therefore $lock_fh goes out of scope and lockfile is closed, and lock released. Background process finishes. Parent process runs 'generating_info' subroutine. Now if it waits a bit like in my rewrite before employing trick mentioned above, and does not print anything if lockfile is released before startup delay, _and_ die_error finishes within this delay, then everything is all right: the error message is sent to client. If die_error doesn't finish before git_generating_data_html prints meta refresh, or there is no startup delay, then error pages would get infinite redirects (remember: there is never anything in cache for error pages). This is a bad thing. One possible solution for this problem (beside employing startup delay) is to have tee output, i.e. print it as it is being captured. Streamed (partial) response would serve as progress indicator for process (re)generating cache; only parallel processes waiting for cache would show 'generating_info'. I think that in current implementation of capturing it would be as simple as not closing STDOUT, but I'd have to check that. 3. Using generic cache engine and memory consumption Most Perl caching interfaces support only $cache->set($key, $data), where $data is a Perl variable, and $data = $cache->get($key), or their equivalents. Even for file-based cache drivers you save from memory and read into memory. The only exception I know of is Cache interface with Cache::File driver, that provides $cache->handle($key [, $mode]), where optional $mode argument can be any of mode strings that can be used in 'open' function, or one of fopen(3) modes. (Of course for memory-based or network-based (like memcached) caches it might not make sense to provide such interface). The most often recommended capture module, Capture::Tiny, allows only capturing into scalar (into memory) $stdout = capture \&code; or (using its prototype) $stdout = capture { <code> }; Well, it is *::Tiny, so it supports minimal API. From the list of different capture modules, that allow capturing of Perl code output, with different interfaces (API), in "See also" section of Capture::Tiny documentation[3], only IO::CaptureOutput allow capturing into specified file: capture \&code, \$stdout, undef, $outfile; [3] http://p3rl.org/Capture::Tiny J.H.'s gitweb output caching v7 captures output directly ito cache files. The problem with doing it in my rewrite is to allow capturing directly into cache entry file without losing ability to select different caching engine, which might be not file-based (like e.g. memcached-based). P.S. You can always check out my latest work on gitweb output caching rewrite here (warning: rebased!) git://repo.or.cz/git/jnareb-git.git gitweb/cache-kernel-pu (it is also available in git://github.com/jnareb/git.git) -- Jakub Narębski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html