[PATCH 14/24] gitweb/lib - Serve stale data when waiting for filling cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When process fails to acquire exclusive (writers) lock, then instead
of waiting for the other process to (re)generate and fill cache, serve
stale (expired) data from cache.  This is of course possible only if
there is some stale data in cache for given key.

This feature of GitwebCache::FileCacheWithLocking is used only for
->update($key, $code) and ->update_fh($key, $code_fh) methods.  It is
controlled by 'max_lifetime' cache parameter; you can set it to -1 to
always serve stale data if it exists, and you can set it to 0 (or any
value smaller than 'expires_min') to turn this feature off.

This feature, as it is implemented currently, makes ->update() method a
bit asymmetric with respect to process that acquired writers lock and
those processes that didn't, which can be seen in the new test in t9503.
The process that is to regenerate (refresh) data in cache must wait for
the data to be generated in full before showing anything to client, while
the other processes can show stale (expired) data immediately.  In order
to remove or reduce this asymmetry gitweb would need to employ one of
two alternate solutions.  Either data should be (re)generated in background,
so that process that acquired writers lock would generate data in
background while serving stale data, or alternatively the process that
generates data should pass output to original STDOUT while capturing it
("tee" output).

Note that process that got stale data serves it immediately, therefore
it wouldn't be available to regenerate data if process regenerating
data died; see commented-out TODO test in t9503.  Otherwise it would
have to wait to check if data got regenerated, which would negate the
idea of serving stale data for a fast return.

When developing this feature, ->is_valid() method in base class
GitwebCache::SimpleFileCache acquired additional extra optional
parameter, where one can pass expire time instead of using whole-cache
global (adaptive) expire time.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@xxxxxxxxxx>
Signed-off-by: Jakub Narebski <jnareb@xxxxxxxxx>
---
Compared to version in previous version of this series the parallel
access test got much improved (this actually started in earlier
commit).

This is the part that is possible _without_ regenerating cache in
background.

Note that here it is explicit that serving stale data when some
process is regenerating cache is possible only with locking enabled,
i.e. when using GitwebCache::FileCacheWithLocking.

 gitweb/gitweb.perl                             |    8 ++
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |  105 ++++++++++++++++++++++--
 gitweb/lib/GitwebCache/SimpleFileCache.pm      |   22 ++++--
 t/t9503/test_cache_interface.pl                |   68 +++++++++++++++-
 4 files changed, 189 insertions(+), 14 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 72683be..454766c 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -327,6 +327,14 @@ our %cache_options = (
 	# lifetime control.
 	# (Compatibile with Cache::Adaptive.)
 	'check_load' => \&get_loadavg,
+
+	# Maximum cache file life, in seconds.  If cache entry lifetime exceeds
+	# this value, it wouldn't be served as being too stale when waiting for
+	# cache to be regenerated/refreshed, instead of trying to display
+	# existing cache date.
+	# Set it to -1 to always serve existing data if it exists,
+	# set it to 0 to turn off serving stale data - always wait.
+	'max_lifetime' => 5*60*60, # 5 hours
 );
 # Set to _initialized_ instance of GitwebCache::Capture compatibile capturing
 # engine, i.e. one implementing ->new() constructor, and ->capture($code)
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index 4d8114d..1d32810 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -25,7 +25,88 @@ use File::Path qw(mkpath);
 use Fcntl qw(:flock);
 
 # ......................................................................
-# constructor is inherited from GitwebCache::SimpleFileCache
+# constructor
+
+# The options are set by passing in a reference to a hash containing
+# any of the following keys:
+#  * 'namespace'
+#    The namespace associated with this cache.  This allows easy separation of
+#    multiple, distinct caches without worrying about key collision.  Defaults
+#    to $DEFAULT_NAMESPACE.
+#  * 'cache_root' (Cache::FileCache compatibile),
+#    'root_dir' (CHI::Driver::File compatibile),
+#    The location in the filesystem that will hold the root of the cache.
+#    Defaults to $DEFAULT_CACHE_ROOT.
+#  * 'cache_depth' (Cache::FileCache compatibile),
+#    'depth' (CHI::Driver::File compatibile),
+#    The number of subdirectories deep to cache object item.  This should be
+#    large enough that no cache directory has more than a few hundred objects.
+#    Defaults to $DEFAULT_CACHE_DEPTH unless explicitly set.
+#  * 'default_expires_in' (Cache::Cache compatibile),
+#    'expires_in' (CHI compatibile) [seconds]
+#    The expiration time for objects place in the cache.
+#    Defaults to -1 (never expire) if not explicitly set.
+#    Sets 'expires_min' to given value.
+#  * 'expires_min' [seconds]
+#    The minimum expiration time for objects in cache (e.g. with 0% CPU load).
+#    Used as lower bound in adaptive cache lifetime / expiration.
+#    Defaults to 20 seconds; 'expires_in' sets it also.
+#  * 'expires_max' [seconds]
+#    The maximum expiration time for objects in cache.
+#    Used as upper bound in adaptive cache lifetime / expiration.
+#    Defaults to 1200 seconds, if not set; 
+#    defaults to 'expires_min' if 'expires_in' is used.
+#  * 'check_load'
+#    Subroutine (code) used for adaptive cache lifetime / expiration.
+#    If unset, adaptive caching is turned off; defaults to unset.
+#  * 'increase_factor' [seconds / 100% CPU load]
+#    Factor multiplying 'check_load' result when calculating cache lietime.
+#    Defaults to 60 seconds for 100% SPU load ('check_load' returning 1.0).
+#
+# (all the above are inherited from GitwebCache::SimpleFileCache)
+#
+#  * 'max_lifetime' [seconds]
+#    If it is greater than 0, and cache entry is expired but not older
+#    than it, serve stale data when waiting for cache entry to be 
+#    regenerated (refreshed).  Non-adaptive.
+#    Defaults to -1 (never expire / always serve stale).
+sub new {
+	my $class = shift;
+	my %opts = ref $_[0] ? %{ $_[0] } : @_;
+
+	my $self = $class->SUPER::new(\%opts);
+
+	my ($max_lifetime);
+	if (%opts) {
+		$max_lifetime =
+			$opts{'max_lifetime'} ||
+			$opts{'max_cache_lifetime'};
+	}
+	$max_lifetime = -1 unless defined($max_lifetime);
+
+	$self->set_max_lifetime($max_lifetime);
+
+	return $self;
+}
+
+# ......................................................................
+# accessors
+
+# http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
+
+# creates get_depth() and set_depth($depth) etc. methods
+foreach my $i (qw(max_lifetime)) {
+	my $field = $i;
+	no strict 'refs';
+	*{"get_$field"} = sub {
+		my $self = shift;
+		return $self->{$field};
+	};
+	*{"set_$field"} = sub {
+		my ($self, $value) = @_;
+		$self->{$field} = $value;
+	};
+}
 
 # ----------------------------------------------------------------------
 # utility functions and methods
@@ -67,7 +148,7 @@ sub _tempfile_to_path {
 
 sub _compute_generic {
 	my ($self, $key,
-	    $get_code, $set_code, $get_locked) = @_;
+	    $get_code, $fetch_code, $set_code, $fetch_locked) = @_;
 
 	my @result = $get_code->();
 	return @result if @result;
@@ -91,17 +172,23 @@ sub _compute_generic {
 				or die "Could't close lockfile '$lockfile': $!";
 
 		} else {
+			# try to retrieve stale data
+			@result = $fetch_code->()
+				if $self->is_valid($key, $self->get_max_lifetime());
+			return @result if @result;
+
 			# get readers lock (wait for writer)
+			# if there is no stale data to serve
 			flock($lock_fh, LOCK_SH);
 			# closing lockfile releases lock
-			if ($get_locked) {
-				@result = $get_code->();
+			if ($fetch_locked) {
+				@result = $fetch_code->();
 				close $lock_fh
 					or die "Could't close lockfile '$lockfile': $!";
 			} else {
 				close $lock_fh
 					or die "Could't close lockfile '$lockfile': $!";
-				@result = $get_code->();
+				@result = $fetch_code->();
 			}
 		}
 	} until (@result || $lock_state);
@@ -126,6 +213,9 @@ sub compute {
 			return $self->get($key);
 		},
 		sub {
+			return $self->fetch($key);
+		},
+		sub {
 			my $data = $code->();
 			$self->set($key, $data);
 			return $data;
@@ -152,9 +242,12 @@ sub compute_fh {
 			return $self->get_fh($key);
 		},
 		sub {
+			return $self->fetch_fh($key);
+		},
+		sub {
 			return $self->set_coderef_fh($key, $code_fh);
 		},
-		1 # $self->get_fh($key); just opens file
+		1 # $self->fetch_fh($key); just opens file
 	);
 }
 
diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index aeb91d4..21ec434 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -365,12 +365,13 @@ sub remove {
 		or die "Couldn't remove file '$file': $!";
 }
 
-# $cache->is_valid($key)
+# $cache->is_valid($key[, $expires_in])
 #
 # Returns a boolean indicating whether $key exists in the cache
-# and has not expired (global per-cache 'expires_in').
+# and has not expired.  Uses global per-cache expires time, unless
+# passed optional $expires_in argument.
 sub is_valid {
-	my ($self, $key) = @_;
+	my ($self, $key, $expires_in) = @_;
 
 	my $path = $self->path_to_key($key);
 
@@ -383,7 +384,7 @@ sub is_valid {
 	return 0 unless ((stat(_))[7] > 0);
 
 	# expire time can be set to never
-	my $expires_in = $self->get_expires_in();
+	$expires_in = defined $expires_in ? $expires_in : $self->get_expires_in();
 	return 1 unless (defined $expires_in && $expires_in >= 0);
 
 	# is file expired?
@@ -441,18 +442,25 @@ sub compute {
 # ......................................................................
 # nonstandard interface methods
 
-sub get_fh {
+sub fetch_fh {
 	my ($self, $key) = @_;
 
 	my $path = $self->path_to_key($key);
 	return unless (defined $path);
 
-	return unless ($self->is_valid($key));
-
 	open my $fh, '<', $path or return;
 	return ($fh, $path);
 }
 
+
+sub get_fh {
+	my ($self, $key) = @_;
+
+	return unless ($self->is_valid($key));
+
+	return $self->fetch_fh($key);
+}
+
 sub set_coderef_fh {
 	my ($self, $key, $code) = @_;
 
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index c6a28f8..8a52261 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -22,7 +22,9 @@ BEGIN { use_ok('GitwebCache::FileCacheWithLocking'); }
 diag("Using lib '$INC[0]'");
 diag("Testing '$INC{'GitwebCache/FileCacheWithLocking.pm'}'");
 
-my $cache = new_ok('GitwebCache::FileCacheWithLocking');
+my $cache = new_ok('GitwebCache::FileCacheWithLocking', [ {
+	'max_lifetime' => 0, # turn it off
+} ]);
 isa_ok($cache, 'GitwebCache::SimpleFileCache');
 
 # Test that default values are defined
@@ -295,6 +297,70 @@ subtest 'parallel access' => sub {
 	done_testing();
 };
 
+# Test that cache returns stale data in existing but expired cache situation
+# (probably should be run only if GIT_TEST_LONG)
+#
+my $stale_value = 'Stale Value';
+
+subtest 'serving stale data when (re)generating' => sub {
+	$cache->set($key, $stale_value);
+	$call_count = 0;
+	$cache->set_expires_in(0);    # expire now
+	$cache->set_max_lifetime(-1); # forever (always serve stale data)
+
+	@output = parallel_run {
+		my $data = cache_compute($cache, $key, \&get_value_slow);
+		print $data;
+	};
+	ok(scalar(grep { $_ eq $stale_value } @output),
+	   'stale data in at least one process when expired');
+
+	$cache->set_expires_in(-1); # never expire for next ->get
+	is($cache->get($key), $value,
+	   'value got set correctly, even if stale data returned');
+
+
+# 	$cache->set($key, $stale_value);
+# 	unlink($lock_file);
+# 	@output = parallel_run {
+# 		my $data = eval { cache_compute($cache, $key, \&get_value_die_once); };
+# 		my $eval_error = $@;
+# 		print "$data" if defined $data;
+# 		print "$sep";
+# 		print "$eval_error" if defined $eval_error;
+# 	};
+#  TODO: {
+# 		local $TODO = 'not implemented';
+#
+# 		is_deeply(
+# 			[sort @output],
+# 			[sort ("$value${sep}", "${sep}get_value_die_once\n")],
+# 			'return non-stale value, even if process regenerating it died'
+# 		);
+#
+# 		$cache->set_expires_in(-1); # never expire for next ->get
+# 		is($cache->get($key), $value,
+# 		   'value got regenerated, even if process regenerating it died');
+# 	};
+# 	unlink($lock_file);
+
+	$cache->set($key, $stale_value);
+	$cache->set_expires_in(0);   # expire now
+	$cache->set_max_lifetime(0); # don't serve stale data
+
+	@output = parallel_run {
+		my $data = cache_compute($cache, $key, \&get_value_slow);
+		print $data;
+	};
+	# no returning stale data
+	ok(!scalar(grep { $_ eq $stale_value } @output),
+	   'no stale data if configured');
+
+
+	done_testing();
+};
+$cache->set_expires_in(-1);
+
 done_testing();
 
 
-- 
1.7.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]