On Thu, Jun 25, 2020 at 01:10:21PM -0400, Bruce Fields wrote: > > On Mon, Jun 08, 2020 at 07:21:22PM +0000, Frank van der Linden wrote: > > So here's what happens: for NFSv4, files that are associated with an > > open stateid can stick around for a long time, as long as there's no > > CLOSE done on them. That's what's happening here. Also, since those files > > have a refcount of >= 2 (one for the hash table, one for being pointed to > > by the state), they are never eligible for removal from the file cache. > > Worse, since the code call nfs_file_gc inline if the upper bound is crossed > > (8192), every single operation that calls nfsd_file_acquire will end up > > walking the entire LRU, trying to free files, and failing every time. > > Walking a list with millions of files every single time isn't great. > > Thanks for tracking this down. > > > > > There are some ways to fix this behavior like: > > > > * Always allow v4 cached file structured to be purged from the cache. > > They will stick around, since they still have a reference, but > > at least they won't slow down cache handling to a crawl. > > If they have to stick around anyway it seems too bad not to be able to > use them. > > I mean, just because a file's opened first by a v4 user doesn't mean it > might not also have other users, right? > > Would it be that hard to make nfsd_file_gc() a little smarter? > > I don't know, maybe it's not worth it. > > --b. Basically, opening, and keeping open, a very large number of v4 files on a client blows up these data structures: * nfs4state.c:file_hashtbl (FH -> nfs4_file) ..and with the addition of filecache: * filecache.c:nfsd_file_hashtbl (ino -> nfsd_file) * filecache.c:nfsd_file_lru nfsd_file_lru causes the most pain, see my description. But the other ones aren't without pain either. I tried an experiment where v4 files don't get added to the filecache, and file_hashtbl started showing up in perf output in a serious way. Not surprising, really, if you hash millions of items in a hash table with 256 buckets. I guess there is an argument to be made that it's such an extreme use case that it's not worth it. On the other hand, clients running the server out of resources and slowing down everything by a lot for all clients isn't great either. Generally, the only way to enforce an upper bound on resource usage without returning permanent errors (to which the client might react badly) seems to be to start invaliding v4 state under pressure. Clients should be prepared for this, as they should be able to recover from a server reboot. On the other hand, it's something you probably only should be doing as a last resort. I'm not sure if consistent behavior for e.g. locks could be guaranteed, I am not very familiar with the locking code. Some ideas to alleviate the pain short of doing the above: * Count v4 references to nfsd_file (filecache) structures. If there is a v4 reference, don't have the file on the LRU, as it's pointless. Do include it in the hash table so that v2/v3 users can find it. This avoids the worst offender (nfsd_file_lru), but does still blow up nfsd_file_hashtbl. * Use rhashtable for the hashtables, as it can automatically grow/shrink the number of buckets. I don't know if the rhashtable code could handle the load, but it might be worth a shot. - Frank