On Sat, 6 Feb 2010, J.H. wrote: > > Table of contents: > > ~~~~~~~~~~~~~~~~~~ > > [RFC PATCH 01/10] gitweb: Print to explicit filehandle (preparing > > for caching) I am working on v2 of this series, where this patch is not necessary. It uses *STDOUT->push_layer(scalar => \$data) and *STDOUT->pop_layer() from PerlIO::Util if it is available, and manipulation of *STDOUT (which means *STDOUT = $data_fh and not $out = $data_fh). But I must say that doing capture of STDOUT (only; STDERR is not captured) without requiring extra Perl modules (like recommended Capture::Tiny or e.g. IO::Capture), and especially testing that it works correctly with capturing output of cache_fetch is serious PITA. This patch has the advantage that all operations are simpler. In particular it is easy to have section which should be not captured, or where capture should be turned off (slightly different). It has the disadvantage that all future contributions must use "print $out <something>" / "print {$out} <something>", and that contributions from before this change would have to be carefully updated. (Well, we could probably add the test that would check that everything that needs to go to $out does, and everything that shouldn't got to $out but to STDOUT doesn't.) If I were to have such patch in new version of "gitweb output caching" series, I would make the following changes: * (optionally) use simpler 'print $out <sth>' instead of visually distinct 'print {$out} <sth>', where from first glance one can see that $out is filehandle and not something to be printed * use short filehandle name: $out, or $oh, or $o/$O. * split above patch in 2 to 4 patches: - pure mechanical (scripted) change: + print <sth> -> print $out <sth> + printf(<sth>) -> printf($out <sth>) + binmode STDOUT -> binmode $out The last with possible exception of very first binmode call. - realign (purely whitespace change) - wrap too long lines (newlines and whitespace), optional - change $out to $bout/$bin ($binary_output_fh) where needed; but see comment below (optional) > > This looks fine, I did some quick testing to verify that this would work > - and it does. I have only ran test, and didn't actually check that it works correctly. This commit shouldn't change gitweb behaviour at all. > > The only caveat that needs to be aware is that if the layer is going to > output binary data it needs to flip the whole stream to :raw before > outputting (this is going to be more specific to the caching layer). > > One advantage to having the file handles separate is that it's easier to > distinguish if the data is going to need to be binary data that will > need to be flipped properly. I don't think that it would be needed. First, all mode changing operations, i.e. calls to binmode are changed to act on $out rather than on STDOUT it means. It means that if we are using 'in memory file' to capture output to scalar variable, then captured data would be properly converted in variable. So it would be enough to save this variable in :raw mode to file. If we are saving directly to cache file, then of course saved data would go through layer and would be converted properly. In any case in cache file we would have _already_ _converted_ data. This means that regardless whether $out used ':utf8' (pseudo)layer, or ':raw' (pseudo)layer, if we read from cache file in ':raw' (binary mode) and print data from cache to original (true) STDOUT also in ':raw' mode, we would print correctly formatted data. > > Also means you could cache the binary data differently than textual data. > > I.E. binary data gets saved to disk, but page data gets saved to memcached. That's true, but on the other hand it would be easy to add some extra command marking data as binary below binmode. Or we can examine IO layers (using PerlIO::get_layers($out); the PerlIO module is in Perl core) if there is 'utf8' layer or 'raw' (pseudo)layer. > > Just food for thought, I'm not sure which way makes more sense > personally, though I would have a tendency to err on the side of > flexibility and have both. It might be good idea... but nevertheless I'd like to have short name for binary filehandle, if we decode to keep it. What should it be? $bout, $bin, $B, $bin_out, $out_bin, $bin_fh? > > > [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1) > > note: delaying additional comment till I've finished reading through the > basics of the following patches. This is to use later _full_ _normalized_ URI as cache key for given page. IIRC in your original patch you ignored path_info; but on the other hand git.kernel.org has path_info feature turned off... > > > [RFC PATCH 03/10] gitweb/cache.pm - Very simple file based caching > > Ok this is quite the departure from what I had, I'm unsure that it's the > right way to go, but it obviously has merits (I.E. much simpler addition > of any Cache::Cache or CHI compatible caching layer) > > This patch itself looks fine, and as it states it borrows heavily from > my basic implementation - just wraps it differently. I might have some > thoughts on extending this a bit to be a bit more flushed out from a > basic standpoint. > > Need to dig through it some more, but I'm generally ok with it. Note that the new implementation in (not send yet) new version of "gitweb output caching" series is based more on newer and more modern CHI unified interface rather than older Cache::Cache interface. It is I think much cleaner and easier to read. The major difference from your implementation is that in my version the gitweb caching engine uses "save to temporary file + rename file to final name" method to have atomic write to cache (atomic cache filling). It should be more robust, but OTOH it introduces a bit of performance penalty. With locking and single writer we could use predictable temporary file name rather than using tempfile/mkstemp or equivalent from File::Temp, or UUID based filename like CHI does it. Also, tests. Current code (even the v2 version) lacks proper error detection, error signalling and logging. > > > [RFC PATCH 04/10] gitweb/cache.pm - Stat-based cache expiration > > Looks fine to me, though the note about getting the errors should get > moved to previous patch, as it says. I wanted to get this series out faster, that is why it is not polished. > > Note: I'm going to stop here as the following are WIP and I want to play > around with this particular direction on my own a little more before > further comment. There's some ideas running around I want to try and > get down in code first. Me moving on and trying these other ideas is > not a reflection on the following patches, just some alternative > thinking before I discuss some other ideas on the following patches. Take a look at gitweb/cache-kernel-v2 branch (the new caching series). Note however that it would be subject to rebasing / changes. > > Also I've been sitting on this e-mail in this state for almost a week > while I've been playing with this and having to fight other fires and I > know that Jakub has been looking for commentary on this. Thank you very much for your commentary, in spite of your heavy load. > > > [RFC PATCH 05/10] gitweb: Use Cache::Cache compatibile (get, set) > > output caching (WIP) > > [RFC PATCH 06/10] gitweb/cache.pm - Adaptive cache expiration time (WIP) > > [RFC PATCH 07/10] gitweb: Use CHI compatibile (compute method) caching (WIP) > > [RFC PATCH 08/10] gitweb/cache.pm - Use locking to avoid 'stampeding herd' > > problem (WIP) > > [RFC PATCH 09/10] gitweb/cache.pm - Serve stale data when waiting for > > filling cache (WIP) > > [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when > > regenerating cache (WIP) There is new version of this series in gitweb/cache-kernel-v2 in my git/jnareb-git.git fork (clone) of git.git repository at repo.or.cz. Now all commits have proper description (for first series one had to read comment section in emails for commit description), and all features are tested (at least on API level, and to some extent) -- full tests do require having PerlIO::Util installed (I have done it following local::lib and installing it from 'cpan' client), though. Also all features are fully configurable, to even greater extent than in original series by J.H. (this what what v1 was lacking). And there is (see diffstat) section about caching in gitweb/README. The following changes since commit d5f8a3d6f4d946c33459e00edf02819f89711777: Junio C Hamano (1): Merge branch 'master' into next are available in the git repository at: git://repo.or.cz/git/jnareb-git.git gitweb/cache-kernel-v2 You can view it via gitweb at: http://repo.or.cz/w/git/jnareb-git.git/shortlog/refs/heads/gitweb/cache-kernel-v2 http://repo.or.cz/w/git/jnareb-git.git/log/refs/heads/gitweb/cache-kernel-v2 SHORTLOG (10): gitweb: href(..., -path_info => 0|1) gitweb/cache.pm - Very simple file based caching gitweb/cache.pm - Stat-based cache expiration gitweb: Use Cache::Cache compatibile (get, set) output caching gitweb/cache.pm - Adaptive cache expiration time gitweb: Use CHI compatibile (compute method) caching gitweb/cache.pm - Use locking to avoid 'cache miss stampede' problem gitweb/cache.pm - Serve stale data when waiting for filling cache gitweb/cache.pm - Regenerate (refresh) cache in background gitweb: Show appropriate "Generating..." page when regenerating cache gitweb/README | 70 +++++ gitweb/cache.pm | 527 ++++++++++++++++++++++++++++++++ gitweb/gitweb.perl | 301 +++++++++++++++++- t/gitweb-lib.sh | 2 + t/t9500-gitweb-standalone-no-errors.sh | 19 ++ t/t9503-gitweb-caching.sh | 32 ++ t/t9503/test_cache_interface.pl | 380 +++++++++++++++++++++++ t/test-lib.sh | 3 + 8 files changed, 1319 insertions(+), 15 deletions(-) create mode 100644 gitweb/cache.pm create mode 100755 t/t9503-gitweb-caching.sh create mode 100755 t/t9503/test_cache_interface.pl -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html