[RFC PATCH v7 11/9] [PoC] gitweb/lib - tee, i.e. print and capture during cache entry generation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Instead of having gitweb use progress info indicator / throbber to
notify user that data is being generated by current process, gitweb
can now (provided that PerlIO::tee from PerlIO::Util is available)
send page to web browser while simultaneously saving it to cache
(print and capture, i.e. tee), thus having incremental generating of
page serve as a progress indicator.


To do this, the GitwebCache::Capture::ToFile module acquired ->tee()
subroutine, similar to ->capture(), but it prints while capturing
output.  The ->tee() method (and its worker methods ->tee_start() and
->tee_end()) are available only if PerlIO::tee from PerlIO::Util
distribution is present.  Tests checking if this feature works as
expected were added to t9510 test.

An alternative would be to provide two versions of
GitwebCache::Capture::ToFile.  Note also that PerlIO::tee is not
strictly necessary, as Capture::Tiny shows, but in most generic case
(like done in Capture::Tiny) one needs separate process functioning as
multplexer.


Because tee-ing (printing while capturing) can function as a kind of
progress indicator only for process generating the data for cache
entry, and not for the processes waiting for data to be generated,
therefore 'generating_info' got splitinto 'get_progress_info' and
'set_progress_info'.  You can set now in GitwebCache::FIleCacheWithLocking
those two separately.  You are expected to unset 'set_progress_info'
when using tee-ing capturing engine.  Some tests added to t9511 with
tee-like situation.

As a proof of concept gitweb now uses two slightly different versions
of "Generating..." page; if you worry about interaction between progress
indicator and non-cacheable error pages, you can set 'set_progress_info'
separately to undef.


The cache_output subroutine from GitwebCache::CacheOutput got updated
to use ->tee() subroutine if $capture supports it.  If ->tee() is
used, then of course generated data doesn't need to and shouldn't be
printed; also cache_output unsets 'set_progress_info' locally.  Note
that ->tee() is used only if we are not in background process; if we
are in background process, simple ->capture() is used.  No new tests
for now.

Signed-off-by: Jakub Narebski <jnareb@xxxxxxxxx>
---
Note: the change to t/gitweb-lib.sh and some of changes to t9510 are
incidental fixes; original commits should be fixed instead.

This is proof of concept (PoC) patch, showing how one can use
"tee"-ing in capturing engine together with gitweb output caching.
Because we don't need and don't use 'generating_info' subroutine for
process that is writing data (one that acquired writers lock) we are
(or at least should be) now safe to have error pages not cached.

Currently the "tee"-ing support requires PerlIO::tee module from the
PerlIO::Util distribution, as it was easiest way to add such feature.
In the future we would have probably to do something similar what
'tee' in Capture::Tiny (or in other capture modules) does.  I'm not
sure if PerlIO::Util is packaged as RPM package anywhere...; well, I have
googled that ALT Linux has it: http://sisyphus.ru/en/srpm/perl-PerlIO-Util

I have only ran tests, I haven't actually run gitweb with those
changes... :-P

 gitweb/gitweb.perl                             |   32 +++++++----
 gitweb/lib/GitwebCache/CacheOutput.pm          |   18 ++++++-
 gitweb/lib/GitwebCache/Capture/ToFile.pm       |   67 ++++++++++++++++++++++-
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |   52 +++++++++++++-----
 t/gitweb-lib.sh                                |    2 +-
 t/t9510/test_capture_interface.pl              |   28 +++++++++-
 t/t9511/test_cache_interface.pl                |   29 ++++++++++
 7 files changed, 194 insertions(+), 34 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 5ef668d..de283a0 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -325,19 +325,17 @@ our %cache_options = (
 	# serving possibly stale data.
 	'background_cache' => 1,
 
-	# Subroutine which would be called when gitweb has to wait for data to
+	# Subroutines which would be called when gitweb has to wait for data to
 	# be generated (it can't serve stale data because there isn't any,
-	# or if it exists it is older than 'max_lifetime').  The default
-	# is to use git_generating_data_html(), which creates "Generating..."
-	# page, which would then redirect or redraw/rewrite the page when
-	# data is ready.
-	# Set it to `undef' to disable this feature.
+	# or if it exists it is older than 'max_lifetime').
+	# Set them to `undef' to disable this feature.
 	#
-	# Such subroutine (if invoked from GitwebCache::FileCacheWithLocking)
+	# Such subroutines (if invoked from GitwebCache::FileCacheWithLocking)
 	# is passed the following parameters: $cache instance, human-readable
 	# $key to current page, and $sync_coderef subroutine to invoke to wait
 	# (in a blocking way) for data.
-	'generating_info' => \&git_generating_data_html,
+	'get_progress_info' => \&git_get_progress_info_html,
+	'set_progress_info' => \&git_set_progress_info_html,
 
 	# How to handle runtime errors occurring during cache gets and cache
 	# sets.  Options are:
@@ -3721,9 +3719,21 @@ sub get_page_title {
 	return $title;
 }
 
+sub git_get_progress_info_html {
+	git_generating_data_html("Waiting", @_);
+}
+
+sub git_set_progress_info_html {
+	# minimum startup delay is 2 seconds, just in case, for error handling
+	local $generating_options{'startup_delay'} =
+		$generating_options{'startup_delay'} > 2 ? $generating_options{'startup_delay'} : 2;
+
+	git_generating_data_html("Generating", @_);
+}
+
 # creates "Generating..." page when caching enabled and not in cache
 sub git_generating_data_html {
-	my ($cache, $key, $sync_coderef) = @_;
+	my ($msg, $cache, $key, $sync_coderef) = @_;
 
 	# when should gitweb show "Generating..." page
 	if ((defined $actions_info{$action}{'output_format'} &&
@@ -3749,7 +3759,7 @@ sub git_generating_data_html {
 		}
 	}
 
-	my $title = "[Generating...] " . get_page_title();
+	my $title = "[$msg...] " . get_page_title();
 	# TODO: the following line of code duplicates the one
 	# in git_header_html, and it should probably be refactored.
 	my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : '';
@@ -3786,7 +3796,7 @@ sub git_generating_data_html {
 EOF
 
 	local $| = 1; # autoflush
-	print STDOUT 'Generating...';
+	print STDOUT "$msg...";
 
 	my $total_time = 0;
 	my $interval = $generating_options{'print_interval'} || 1;
diff --git a/gitweb/lib/GitwebCache/CacheOutput.pm b/gitweb/lib/GitwebCache/CacheOutput.pm
index 188d4ab..3bcd35b 100644
--- a/gitweb/lib/GitwebCache/CacheOutput.pm
+++ b/gitweb/lib/GitwebCache/CacheOutput.pm
@@ -42,6 +42,14 @@ sub cache_output {
 
 
 	my $pid = $$;
+	my $can_tee = $capture->can('tee');
+	# if $capture can tee, we don't need progress info for generating (on set).
+	# the below breaks encapsulation, but it is a bit simpler than
+	# $old = $cache->get_...; $cache->set_...(...); ...; $cache->set_...($old);
+	local $cache->{'set_progress_info'} = undef
+		if ($can_tee);
+
+	my $printed = 0;
 	my ($fh, $filename);
 	my ($capture_fh, $capture_filename);
 	eval { # this `eval` is to catch rethrown error, so we can print captured output
@@ -54,7 +62,12 @@ sub cache_output {
 			}
 
 			# this `eval` is to be able to cache error output (up till 'die')
-			eval { $capture->capture($code, $capture_fh); };
+			if ($can_tee && $$ == $pid) {
+				$printed = 1;
+				eval { $capture->tee($code, $capture_fh); };
+			} else {
+				eval { $capture->capture($code, $capture_fh); };
+			}
 
 			# note that $cache can catch this error itself (like e.g. CHI);
 			# use "die"-ing error handler to rethrow this exception to outside
@@ -69,7 +82,8 @@ sub cache_output {
 		$filename ||= $capture_filename;
 	}
 
-	if (defined $fh || defined $filename) {
+	if ((defined $fh || defined $filename) &&
+	    !$printed) { # did we tee, i.e. already printed output?
 		# set binmode only if $fh is defined (is a filehandle)
 		# File::Copy::copy opens files given by filename in binary mode
 		binmode $fh,    ':raw' if (defined $fh);
diff --git a/gitweb/lib/GitwebCache/Capture/ToFile.pm b/gitweb/lib/GitwebCache/Capture/ToFile.pm
index d2dbf0f..0290ec4 100644
--- a/gitweb/lib/GitwebCache/Capture/ToFile.pm
+++ b/gitweb/lib/GitwebCache/Capture/ToFile.pm
@@ -20,6 +20,10 @@ use warnings;
 use PerlIO;
 use Symbol qw(qualify_to_ref);
 
+BEGIN {
+	eval { use PerlIO::Util; };
+}
+
 # Constructor
 sub new {
 	my $class = shift;
@@ -30,22 +34,41 @@ sub new {
 	return $self;
 }
 
-sub capture {
+sub capture_or_tee {
 	my $self = shift;
 	my $code = shift;
+	my ($start, $stop) = @{ shift() };
 
-	$self->capture_start(@_); # pass rest of params
+	$self->$start(@_); # pass rest of params
 	eval { $code->(); 1; };
 	my $exit_code = $?; # save this for later
 	my $error = $@;     # save this for later
 
-	my $got_out = $self->capture_stop();
+	my $got_out = $self->$stop();
 	$? = $exit_code;
 	die $error if $error;
 
 	return $got_out;
 }
 
+sub capture {
+	my ($self, $code, @args) = @_;
+
+	return
+		$self->capture_or_tee($code, ['capture_start', 'capture_stop'], @args);
+}
+
+BEGIN {
+	if ($INC{'PerlIO/Util.pm'}) {
+		*tee = sub {
+			my ($self, $code, @args) = @_;
+
+			return
+				$self->capture_or_tee($code, ['tee_start', 'tee_stop'], @args);
+		};
+	}
+}
+
 # ----------------------------------------------------------------------
 
 # Start capturing data (STDOUT)
@@ -92,6 +115,44 @@ sub capture_stop {
 	return exists $self->{'to'} ? $self->{'to'} : $self->{'data'};
 }
 
+# ......................................................................
+
+BEGIN {
+	if ($INC{'PerlIO/Util.pm'}) {
+		*tee_start = sub {
+			my ($self, $to) = @_;
+
+			# save layers, to replay them on top of 'tee' layer (?)
+			my @layers = PerlIO::get_layers(\*STDOUT);
+
+			$self->{'to'} = $to;
+			*STDOUT->push_layer('tee' => $to);
+
+			_relayer(\*STDOUT, \@layers); # is it necessary?
+
+			# started tee-ing
+			$self->{'teeing'} = 1;
+		};
+		*tee_stop = sub {
+			my $self = shift;
+
+			# return if we didn't start tee-ing
+			return unless delete $self->{'teeing'};
+
+			my @top_layers;
+			while ((my $layer = *STDOUT->pop_layer()) ne 'tee') {
+				push @top_layers, $layer;
+			}
+			binmode(STDOUT, join(":", ":", @top_layers));
+			# or is it binmode(STDOUT, join(":", ":raw", @top_layers));
+
+			return exists $self->{'to'} ? $self->{'to'} : $self->{'data'};
+		};
+	}
+}
+
+# ----------------------------------------------------------------------
+
 # taken from Capture::Tiny by David Golden, Apache License 2.0
 # with debugging stripped out
 sub _relayer {
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index 291526e..09ae7b2 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -76,12 +76,17 @@ our $EXPIRE_NOW = 0;
 #  * 'background_cache' (boolean)
 #    This enables/disables regenerating cache in background process.
 #    Defaults to true.
-#  * 'generating_info'
+#  * 'get_progress_info',
+#    'set_progress_info',
+#    'generating_info' (code reference)
 #    Subroutine (code) called when process has to wait for cache entry
 #    to be (re)generated (when there is no not-too-stale data to serve
 #    instead), for other process (or bacground process).  It is passed
 #    $cache instance, $key, and $wait_code subroutine (code reference)
 #    to invoke (to call) to wait for cache entry to be ready.
+#    'get_progress_info' gets called on getting data from cache, i.e.
+#    when waiting for data to be generated, 'set_progress_info' gets
+#    called when waiting to generate data; 'generating_info' sets both.
 #    Unset by default (which means no activity indicator).
 #  * 'on_error' (similar to CHI 'on_get_error'/'on_set_error')
 #    How to handle runtime errors occurring during cache gets and cache
@@ -120,8 +125,14 @@ sub new {
 	$self->{'background_cache'} =
 		exists $opts{'background_cache'} ? $opts{'background_cache'} :
 		1;
-	$self->{'generating_info'} = $opts{'generating_info'}
-		if exists $opts{'generating_info'};
+	$self->{'get_progress_info'} =
+		exists $opts{'get_progress_info'} ? $opts{'get_progress_info'} :
+		exists $opts{'generating_info'}   ? $opts{'generating_info'} :
+		undef;
+	$self->{'set_progress_info'} =
+		exists $opts{'set_progress_info'} ? $opts{'set_progress_info'} :
+		exists $opts{'generating_info'}   ? $opts{'generating_info'} :
+		undef;
 	$self->{'on_error'} =
 		exists $opts{'on_error'}      ? $opts{'on_error'} :
 		exists $opts{'on_get_error'}  ? $opts{'on_get_error'} :
@@ -142,7 +153,7 @@ sub new {
 
 # creates get_depth() and set_depth($depth) etc. methods
 foreach my $i (qw(depth root namespace expires_in max_lifetime
-                  background_cache generating_info
+                  background_cache get_progress_info set_progress_info
                   on_error)) {
 	my $field = $i;
 	no strict 'refs';
@@ -156,14 +167,25 @@ foreach my $i (qw(depth root namespace expires_in max_lifetime
 	};
 }
 
-# $cache->generating_info($wait_code);
-# runs 'generating_info' subroutine, for activity indicator,
-# checking if it is defined first.
-sub generating_info {
+sub set_generating_info {
 	my $self = shift;
 
-	if (defined $self->{'generating_info'}) {
-		$self->{'generating_info'}->($self, @_);
+	$self->set_get_progress_info(@_);
+	$self->set_set_progress_info(@_);
+}
+
+# $cache->{get,set}_progress_info($key, $wait_code);
+# runs '{get,set}_progress_info' subroutine, for activity indicator,
+# checking if it is defined first.
+foreach my $name qw(get_progress_info set_progress_info) {
+	my $method = $name;
+	no strict 'refs';
+	*{"$method"} = sub {
+		my $self = shift;
+
+		if (defined $self->{$name}) {
+			$self->{$name}->($self, @_);
+		}
 	}
 }
 
@@ -269,11 +291,11 @@ sub _tempfile_to_path {
 # Wait for data to be available using (blocking) $code,
 # then return filehandle and filename to read from for $key.
 sub _wait_for_data {
-	my ($self, $key, $sync_coderef) = @_;
+	my ($self, $key, $progress_info, $sync_coderef) = @_;
 	my @result;
 
 	# provide "generating page..." info, if exists
-	$self->generating_info($key, $sync_coderef);
+	$self->$progress_info($key, $sync_coderef);
 	# generating info may exit, so we can not get there
 
 	# wait for data to be available
@@ -300,7 +322,7 @@ sub _set_maybe_background {
 		# or if main process would show progress indicator
 		$detach = @stale_result;
 		$pid = fork()
-			if (@stale_result || $self->{'generating_info'});
+			if (@stale_result || $self->{'set_progress_info'});
 	}
 
 	if ($pid) {
@@ -503,7 +525,7 @@ sub compute_fh {
 					or $self->_handle_error("Couldn't reopen (for reading) lockfile '$lockfile': $!");
 
 				eval {
-					@result = $self->_wait_for_data($key, sub {
+					@result = $self->_wait_for_data($key, 'set_progress_info', sub {
 						flock($lock_fh, LOCK_SH);
 						# or 'waitpid -1, 0;', or 'wait;', as we don't detach now in this situation
 					});
@@ -534,7 +556,7 @@ sub compute_fh {
 			# wait for regeneration if no stale data to serve,
 			# using shared / readers lock to sync (wait for data)
 			eval {
-				@result = $self->_wait_for_data($key, sub {
+				@result = $self->_wait_for_data($key, 'get_progress_info', sub {
 					flock($lock_fh, LOCK_SH);
 				});
 			};
diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index 8652c91..f0ef009 100755
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -57,7 +57,7 @@ gitweb_enable_caching () {
 		cat >>gitweb_config.perl <<-\EOF &&
 		$caching_enabled = 1;
 		$cache_options{"expires_in"} = -1;      # never expire cache for tests
-		$cache_options{"cache_root"} = "$TRASH_DIRECTORY/cache"; # to clear the right thing
+		$cache_options{"cache_root"} = "cache"; # to clear the right thing
 		$cache_options{"background_cache"} = 0; # no background processes in test suite
 		$cache_options{"generating_info"} = undef; # tests do not use web browser
 		EOF
diff --git a/t/t9510/test_capture_interface.pl b/t/t9510/test_capture_interface.pl
index 6d90497..35e46ad 100755
--- a/t/t9510/test_capture_interface.pl
+++ b/t/t9510/test_capture_interface.pl
@@ -116,8 +116,8 @@ $captured = $outer_capture->capture(sub {
 	print "|post";
 }, 'outer_actual');
 
-my $inner = read_file('inner_actual');
-my $outer = read_file('outer_actual');
+$inner = read_file('inner_actual');
+$outer = read_file('outer_actual');
 
 is($inner, "INNER:pre|",
    'nested capture with die: inner output captured up to die');
@@ -125,6 +125,30 @@ is($outer, "pre|@=die from inner\n|post",
    'nested capture with die: outer caught rethrown exception from inner');
 
 
+# Testing tee feature, if available
+#
+SKIP: {
+	skip "PerlIO::Util module not found", 3
+		unless eval { require PerlIO::Util; 1 };
+
+	can_ok($capture, 'tee');
+
+	$captured = $outer_capture->capture(sub {
+		print "pre|";
+		my $captured = $capture->tee(sub {
+			print "INNER";
+		}, 'inner_actual');
+		print "|post";
+	}, 'outer_actual');
+
+	$inner = read_file('inner_actual');
+	$outer = read_file('outer_actual');
+
+	is($inner, "INNER",          'tee: captured');
+	is($outer, "pre|INNER|post", 'tee: printed');
+};
+
+
 done_testing();
 
 # Local Variables:
diff --git a/t/t9511/test_cache_interface.pl b/t/t9511/test_cache_interface.pl
index 1e8feb3..83f3894 100755
--- a/t/t9511/test_cache_interface.pl
+++ b/t/t9511/test_cache_interface.pl
@@ -154,6 +154,14 @@ sub get_value_slow_fh {
 	sleep $slow_time;
 	print {$fh} $value;
 }
+sub tee_value_slow_fh {
+	my $fh = shift;
+
+	$call_count++;
+	sleep $slow_time;
+	print       $value;
+	print {$fh} $value;
+}
 sub get_value_die {
 	$call_count++;
 	die "get_value_die\n";
@@ -402,6 +410,26 @@ subtest 'generating progress info' => sub {
 	);
 
 
+	# with background generation, tee-like, no stale data
+	$cache->set_set_progress_info(undef);
+	$cache->set_background_cache(1);
+	$cache->remove($key); # no data and no stale value
+	$call_count = 0;
+
+	@output = parallel_run {
+		my $data = cache_compute_fh($cache, $key, \&tee_value_slow_fh);
+		print "$sep$call_count$sep";
+		print $data if defined $data;
+	};
+	my $getting_output = (grep /\Q${sep}0${sep}\E/, @output)[0];
+	my $setting_output = (grep /\Q${sep}1${sep}\E/, @output)[0];
+	is($getting_output, "$progress_info${sep}0$sep$value",
+	   'background, no stale, tee: waiting process prints progress, gets data');
+	is($setting_output, "$value${sep}1$sep$value",
+	   'background, no stale, tee: generating process prints data, sets data');
+	$cache->set_generating_info(\&test_generating_info); # restore
+
+
 	# with background generation, with stale value
 	cache_set_fh($cache, $key, $stale_value);
 	$cache->set_expires_in(0);    # set value is now expired
@@ -427,6 +455,7 @@ subtest 'generating progress info' => sub {
 };
 $cache->set_expires_in(-1);
 
+
 done_testing();
 
 
-- 
1.7.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]