In the ->compute($key, $code) method from GitwebCache::SimpleFileCache, use locking (via flock) to ensure that only one process would generate data to update/fill-in cache; the rest would wait for the cache to be (re)generated and would read data from cache. If process that was to (re)generate data dies or exits, one of the readers would take its role. Currently this feature can not be disabled (via %cache_options). Other future features (like: serving stale data while cache is being regenerated, (re)generating cache in background, activity indicator) all depend on locking. A test in t9503 shows that in the case where there are two clients trying to simultaneously access non-existent or stale cache entry, (and generating data takes (artifically) a bit of time), if they are using ->compute method the data is (re)generated once, as opposed to if those clients are just using ->get/->set methods. To be implemented (from original patch by J.H.): * background building, and showing stale cache * server-side progress indicator when waiting for filling cache, which in turn requires separating situations (like snapshots and other non-HTML responses) where we should not show 'please wait' message Note that there is slight inefficiency, in that filename for lockfile depends on the filename for cache entry (it just adds '.lock' suffix), but is recalculated from $key for both paths. Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@xxxxxxxxxx> Signed-off-by: Jakub Narebski <jnareb@xxxxxxxxx> --- Differences from v2: * More comments. * Add loop in ->compute(), so if process that was to (re)generate data dies or exits, one of the readers would take its role. Such situation should be rare: it can happen if two client access the same invalid URL. * Add test for such situation in t9503 Differences from relevant parts of J.H. patch: * Locking on separate *.lock file, for simpler work and robustness; acquiring exclusive lock might require having file opened for possible write, following "File Locking Tricks and Traps" (http://perl.plover.com/yak/flock/) J.H. patch used separate *.bg file for lock only for background caching (to be implemented in one of next commits). * Ensure that ->compute() delivers $data, unless $code fails. See above about adding loop in ->compute() method. * Single variable $lock_state, in place of $lockingStatus, $lockStatBG, $lockStat, $lockStatus in J.H. patch. * Use indirect (lexical) filehandles for lockfiles, instead of barewords, i.e. global filehandles: open my $lock_fh, '+>', $lockfile; rather than equivalent of open LOCK, '>:utf8', $lockfile * Do not use LOCK_UN: closing lockfile would unlock * In my opinion much cleaner flow control Possible improvements: * When using locking, only one process would be able to write to temporary file. Therefore temporary file can now have non-random name; no need for File::Temp nor unique_id (performance). It would probably be best done via ->get_tempfile() method, or something like that. * Run tests in t9503 which contain 'sleep 1' in them (which is required to test how they handle concurrent access, e.g. testing cache miss stampede situation) only when test is run with --long. This would require update to t/test-lib.sh to export GIT_TEST_LONG to external program in test_external etc. * ->get_lockname should perhaps be ->get_lockfile which would return opened filehandle, or filehandle + filename like File::Temp::tempfile, or objects that functions like filehandle but stringifies to filename like File::Temp->new(). * Equivalent of old _Make_Path? * Would ability to turn this feature off make sense? gitweb/cache.pm | 44 +++++++++++++++- t/t9503/test_cache_interface.pl | 107 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 148 insertions(+), 3 deletions(-) diff --git a/gitweb/cache.pm b/gitweb/cache.pm index b828102..25f0278 100644 --- a/gitweb/cache.pm +++ b/gitweb/cache.pm @@ -25,6 +25,7 @@ use File::Path qw(mkpath); use File::Spec; use File::Temp; use Digest::MD5 qw(md5_hex); +use Fcntl qw(:flock); # by default, the cache nests all entries on the filesystem two # directories deep @@ -226,6 +227,13 @@ sub path_to_key { return $filepath; } +# Take an human readable key, and return path to be used for lockfile +sub get_lockname { + my $self = shift; + + return $self->path_to_key(@_) . '.lock'; +} + # ---------------------------------------------------------------------- # worker methods @@ -393,15 +401,45 @@ sub get { # get $key; if successful, returns the value. Otherwise, calls $code # and uses the return value as the new value for $key, which is then # returned. +# Uses file locking to have only one process updating value for $key +# to avoid 'cache miss stampede' (aka 'stampeding herd') problem. sub compute { my ($self, $p_key, $p_coderef) = @_; my $data = $self->get($p_key); - if (!defined $data) { - $data = $p_coderef->($self, $p_key); - $self->set($p_key, $data); + return $data if defined $data; + + my $dir; + my $lockfile = $self->get_lockname($p_key, \$dir); + + # ensure that directory leading to lockfile exists + if (!-d $dir) { + mkpath($dir, 0, 0777) + or die "Couldn't mkpath '$dir' for lockfile: $!"; } + # this loop is to protect against situation where process that + # acquired exclusive lock (writer) dies or exits (die_error) + # before writing data to cache + my $lock_state; # needed for loop condition + do { + open my $lock_fh, '+>', $lockfile + or die "Could't open lockfile '$lockfile': $!"; + $lock_state = flock($lock_fh, LOCK_EX | LOCK_NB); + if ($lock_state) { + # acquired writers lock + $data = $p_coderef->($self, $p_key); + $self->set($p_key, $data); + } else { + # get readers lock + flock($lock_fh, LOCK_SH); + $data = $self->fetch($p_key); + } + # closing lockfile releases lock + close $lock_fh + or die "Could't close lockfile '$lockfile': $!"; + } until (defined $data || $lock_state); + # repeat until we have data, or we tried generating data oneself and failed return $data; } diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl index d28f4df..255fad2 100755 --- a/t/t9503/test_cache_interface.pl +++ b/t/t9503/test_cache_interface.pl @@ -89,6 +89,113 @@ is($cache->compute($key, \&get_value), $value, 'compute 2nd time (get)'); is($cache->compute($key, \&get_value), $value, 'compute 3rd time (get)'); cmp_ok($call_count, '==', 1, 'get_value() is called once from compute'); +# Test 'stampeding herd' / 'cache miss stampede' problem +# (probably should be run only if GIT_TEST_LONG) +# +sub get_value_slow { + $call_count++; + sleep 1; + return $value; +} +my ($pid, $kid_fh); + +$call_count = 0; +$cache->remove($key); +$pid = open $kid_fh, '-|'; +SKIP: { + skip "cannot fork: $!", 1 + unless defined $pid; + + my $data = $cache->get($key); + if (!defined $data) { + $data = get_value_slow(); + $cache->set($key, $data); + } + + if ($pid) { + my $child_count = <$kid_fh>; + chomp $child_count; + + waitpid $pid, 0; + close $kid_fh; + + $call_count += $child_count; + } else { + print "$call_count\n"; + exit 0; + } + + cmp_ok($call_count, '==', 2, 'parallel get/set: get_value_slow() called twice'); +} + +$call_count = 0; +$cache->remove($key); +$pid = open $kid_fh, '-|'; +SKIP: { + skip "cannot fork: $!", 1 + unless defined $pid; + + my $data = $cache->compute($key, \&get_value_slow); + + if ($pid) { + my $child_count = <$kid_fh>; + chomp $child_count; + + waitpid $pid, 0; + close $kid_fh; + + $call_count += $child_count; + } else { + print "$call_count\n"; + exit 0; + } + + cmp_ok($call_count, '==', 1, 'parallel compute: get_value_slow() called once'); +} + +# Test that it doesn't hang if get_action exits/dies +# +sub get_value_die { + die "get_value_die\n"; +} +print STDERR "$$: start\n"; +my $result = eval { + $pid = open $kid_fh, '-|'; + SKIP: { + skip "cannot fork: $!", 2 + unless defined $pid; + + local $SIG{ALRM} = sub { die "alarm\n"; }; + alarm 4; + + my ($eval_error, $child_eval_error); + eval { $cache->compute('No Key', \&get_value_die); }; + $eval_error = $@; + + if ($pid) { + local $/ = "\0"; + + $child_eval_error = <$kid_fh>; + chomp $child_eval_error; + + waitpid $pid, 0; + close $kid_fh; + } else { + + print "$eval_error\0"; + exit 0; + } + + is($eval_error, "get_value_die\n", 'get_value_die() died in parent'); + is($child_eval_error, "get_value_die\n", 'get_value_die() died in child'); + + alarm 0; + } +}; +ok(!$@, 'no alarm call (neither process hung)'); +diag($@) if $@; + + # Test cache expiration for 'expire now' # $cache->set_expires_in(60*60*24); # set expire time to 1 day -- 1.6.6.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html