Re: [PATCH] SUNRPC/cache: Allow garbage collection of invalid cache entries

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Fri, 27 Mar 2020 12:33:30 +0000

On Thu, 2020-03-26 at 21:50 -0400, J. Bruce Fields wrote:
> On Thu, Mar 26, 2020 at 09:42:19PM +0000, Trond Myklebust wrote:
> > On Thu, 2020-03-26 at 16:40 -0400, bfields@xxxxxxxxxxxx wrote:
> > > Maybe the cache_is_expired() logic should be something more like:
> > > 
> > > 	if (h->expiry_time < seconds_since_boot())
> > > 		return true;
> > > 	if (!test_bit(CACHE_VALID, &h->flags))
> > > 		return false;
> > > 	return h->expiry_time < seconds_since_boot();

Did you mean

return detail->flush_time >= h->last_refresh;

instead of repeating the h->expiry_time check?

> > > 
> > > So invalid cache entries (which are waiting for a reply from
> > > mountd)
> > > can
> > > expire, but they can't be flushed.  If that makes sense.
> > > 
> > > As a stopgap we may want to revert or drop the "Allow garbage
> > > collection" patch, as the (preexisting) memory leak seems lower
> > > impact
> > > than the server hang.
> > 
> > I believe you were probably seeing the effect of the
> > cache_listeners_exist() test, which is just wrong for all cache
> > upcall
> > users except idmapper and svcauth_gss. We should not be creating
> > negative cache entries just because the rpc.mountd daemon happens
> > to be
> > slow to connect to the upcall pipes when starting up, or because it
> > crashes and fails to restart correctly.
> > 
> > That's why, when I resubmitted this patch, I included 
> > https://git.linux-nfs.org/?p=cel/cel-2.6.git;a=commitdiff;h=b840228cd6096bebe16b3e4eb5d93597d0e02c6d
> > 
> > which turns off that particular test for all the upcalls to
> > rpc.mountd.
> 
> The hangs persist with that patch, but go away with the change to the
> cache_is_expired() logic above.

Fair enough. Do you want to send Chuck a fix?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx