Re: filecache LRU performance regression

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Fri, 27 May 2022 21:34:18 +0000

> On May 27, 2022, at 4:37 PM, Frank van der Linden <fllinden@xxxxxxxxxx> wrote:
> 
> On Fri, May 27, 2022 at 06:59:47PM +0000, Chuck Lever III wrote:
>> 
>> 
>> Hi Frank-
>> 
>> Bruce recently reminded me about this issue. Is there a bugzilla somewhere?
>> Do you have a reproducer I can try?
> 
> Hi Chuck,
> 
> The easiest way to reproduce the issue is to run generic/531 over an
> NFSv4 mount, using a system with a larger number of CPUs on the client
> side (or just scaling the test up manually - it has a calculation based
> on the number of CPUs).
> 
> The test will take a long time to finish. I initially described the
> details here:
> 
> https://lore.kernel.org/linux-nfs/20200608192122.GA19171@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> 
> Since then, it was also reported here:
> 
> https://lore.kernel.org/all/20210531125948.2D37.409509F4@xxxxxxxxxxxx/T/#m8c3e4173696e17a9d5903d2a619550f352314d20

Thanks for the summary. So, there isn't a bugzilla tracking this
issue? If not, please create one here:

  https://bugzilla.linux-nfs.org/

Then we don't have to keep asking for a repeat summary ;-)

> I posted an experimental patch, but it's actually not quite correct
> (although I think the idea behind it is makes sense):
> 
> https://lore.kernel.org/linux-nfs/20201020183718.14618-4-trondmy@xxxxxxxxxx/T/#m869aa427f125afee2af9a89d569c6b98e12e516f

A handful of random comments:

- nfsd_file_put() is now significantly different than it used
  to be, so that part of the patch will have to be updated in
  order to apply to v5.18+

- IMO maybe that hash table should be replaced with something
  more memory and CPU-cache efficient, like a Maple tree. That
  patch can certainly be developed independently of the other
  LRU/GC related fixes you proposed. (Added Matthew and Liam
  for their take).

How can we take this forward? Would you like to continue
developing these patches, or should I adopt the project?
Maybe Matthew or Liam see a straightforward way to convert
the nfsd_file hash table as a separate project?

> The basic problem from the initial email I sent:
> 
>> So here's what happens: for NFSv4, files that are associated with an
>> open stateid can stick around for a long time, as long as there's no
>> CLOSE done on them. That's what's happening here. Also, since those files
>> have a refcount of >= 2 (one for the hash table, one for being pointed to
>> by the state), they are never eligible for removal from the file cache.
>> Worse, since the code call nfs_file_gc inline if the upper bound is crossed
>> (8192), every single operation that calls nfsd_file_acquire will end up
>> walking the entire LRU, trying to free files, and failing every time.
>> Walking a list with millions of files every single time isn't great.

Walking a linked list is just about the worst thing to do
to the cache on a modern CPU. I vote for replacing that
data structure.

> I guess the basic issues here are:
> 
> 1) Should these NFSv4 files be in the filecache at all? They are readily
>   available from the state, no need for additional caching.

I agree that LRU and garbage collection are not needed for
managing NFSv4 files. NFSv4 OPEN/CLOSE and lease destruction
are already effective at that.

Re-reading the old email thread, I'm not exactly sure what
"turning off the filecache for NFSv4" means exactly. Fleshing
that idea out could be helpful. Would it make sense to simply
disable the GC walk when releasing NFSv4 files?

Interestingly, the dcache has the same problem with negative
dentries, which can grow without bound, creating a significant
performance impact on systems with high uptime. There were a
few other similar cases discussed at the recent LSF/MM.

> 2) In general: can state ever be gracefully retired if the client still
>   has an OPEN?

I think that's what NFS4ERR_ADMIN_REVOKED is for? It's not
exactly graceful, but the server can warn the client that
state is gone.

Also, I expect a server might recall a delegation if it needs
to reclaim resources. We've talked in the past about a server
that recalls in order to cap the number of delegations that
need to be returned by the client when the lease is destroyed.

--
Chuck Lever