Re: [PATCH] SUNRPC/cache: Allow garbage collection of invalid cache entries

"J. Bruce Fields" <bfields@xxxxxxxxxx> · Thu, 26 Mar 2020 21:50:12 -0400

On Thu, Mar 26, 2020 at 09:42:19PM +0000, Trond Myklebust wrote:
> On Thu, 2020-03-26 at 16:40 -0400, bfields@xxxxxxxxxxxx wrote:
> > Maybe the cache_is_expired() logic should be something more like:
> > 
> > 	if (h->expiry_time < seconds_since_boot())
> > 		return true;
> > 	if (!test_bit(CACHE_VALID, &h->flags))
> > 		return false;
> > 	return h->expiry_time < seconds_since_boot();
> > 
> > So invalid cache entries (which are waiting for a reply from mountd)
> > can
> > expire, but they can't be flushed.  If that makes sense.
> > 
> > As a stopgap we may want to revert or drop the "Allow garbage
> > collection" patch, as the (preexisting) memory leak seems lower
> > impact
> > than the server hang.
> 
> I believe you were probably seeing the effect of the
> cache_listeners_exist() test, which is just wrong for all cache upcall
> users except idmapper and svcauth_gss. We should not be creating
> negative cache entries just because the rpc.mountd daemon happens to be
> slow to connect to the upcall pipes when starting up, or because it
> crashes and fails to restart correctly.
> 
> That's why, when I resubmitted this patch, I included 
> https://git.linux-nfs.org/?p=cel/cel-2.6.git;a=commitdiff;h=b840228cd6096bebe16b3e4eb5d93597d0e02c6d
> 
> which turns off that particular test for all the upcalls to rpc.mountd.

The hangs persist with that patch, but go away with the change to the
cache_is_expired() logic above.

--b.