Re: filecache LRU performance regression

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Wed, 1 Jun 2022 21:37:58 +0000

> On Jun 1, 2022, at 5:18 PM, Frank van der Linden <fllinden@xxxxxxxxxx> wrote:
> 
> On Wed, Jun 01, 2022 at 04:37:47PM +0000, Chuck Lever III wrote:
>> 
>>> On Jun 1, 2022, at 12:10 PM, Frank van der Linden <fllinden@xxxxxxxxxx> wrote:
>>> 
>>> On Wed, Jun 01, 2022 at 12:34:34AM +0000, Chuck Lever III wrote:
>>>>> On May 27, 2022, at 5:34 PM, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On May 27, 2022, at 4:37 PM, Frank van der Linden <fllinden@xxxxxxxxxx> wrote:
>>>>>> 
>>>>>> On Fri, May 27, 2022 at 06:59:47PM +0000, Chuck Lever III wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> Hi Frank-
>>>>>>> 
>>>>>>> Bruce recently reminded me about this issue. Is there a bugzilla somewhere?
>>>>>>> Do you have a reproducer I can try?
>>>>>> 
>>>>>> Hi Chuck,
>>>>>> 
>>>>>> The easiest way to reproduce the issue is to run generic/531 over an
>>>>>> NFSv4 mount, using a system with a larger number of CPUs on the client
>>>>>> side (or just scaling the test up manually - it has a calculation based
>>>>>> on the number of CPUs).
>>>>>> 
>>>>>> The test will take a long time to finish. I initially described the
>>>>>> details here:
>>>>>> 
>>>>>> https://lore.kernel.org/linux-nfs/20200608192122.GA19171@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
>>>>>> 
>>>>>> Since then, it was also reported here:
>>>>>> 
>>>>>> https://lore.kernel.org/all/20210531125948.2D37.409509F4@xxxxxxxxxxxx/T/#m8c3e4173696e17a9d5903d2a619550f352314d20
>>>>> 
>>>>> Thanks for the summary. So, there isn't a bugzilla tracking this
>>>>> issue? If not, please create one here:
>>>>> 
>>>>> https://bugzilla.linux-nfs.org/
>>>>> 
>>>>> Then we don't have to keep asking for a repeat summary ;-)
>>>> 
>>>> I can easily reproduce this scenario in my lab. I've opened:
>>>> 
>>>> https://bugzilla.linux-nfs.org/show_bug.cgi?id=386
>>>> 
>>> 
>>> Thanks for taking care of that. I'm switching jobs, so I won't have much
>>> time to look at it or test for a few weeks.
>> 
>> No problem. I can reproduce the failure, and I have some ideas
>> of how to address the issue, so I've assigned the bug to myself.
>> 
>> 
>>> I think the basic problem is that the filecache is a clear win for
>>> v3, since that's stateless and it avoids a lookup for each operation.
>>> 
>>> For v4, it's not clear to me that it's much of a win, and in this case
>>> it definitely gets in the way.
>>> 
>>> Maybe the best thing is to not bother at all with the caching for v4,
>> 
>> At this point I don't think we can go that way. The NFSv4 code
>> uses a lot of the same infrastructural helpers as NFSv3, and
>> all of those now depend on the use of nfsd_file objects.
>> 
>> Certainly, though, the filecache plays somewhat different roles
>> for legacy NFS and NFSv4. I've been toying with the idea of
>> maintaining separate filecaches for NFSv3 and NFSv4, since
>> the garbage collection and shrinker rules are fundamentally
>> different for the two, and NFSv4 wants a file closed completely
>> (no lingering open) when it does a CLOSE or DELEGRETURN.
>> 
>> In the meantime, the obvious culprit is the LRU walk during
>> garbage collection is broken. I've talked with Dave Chinner,
>> co-author of list_lru, about a way to straighten this out so
>> that the LRU walk is very nicely bounded and at the same time
>> deals properly with NFSv4 OPEN and CLOSE. Trond also had an
>> idea or two here, and it seems the three of us are on nearly
>> the same page.
>> 
>> Once that is addressed, we can revisit Wang's suggestion of
>> serializing garbage collection, as a nice optimization.
> 
> Sounds good, thanks!
> 
> A related issue: there is currently no upper limit that I can see
> on the number of active OPENs for a client. So essentially, a
> client can run a server out of resources by doing a very large
> number of OPENs.
> 
> Should there be an upper limit, above which requests are either
> denied, or old state is invalidated?

We need to explore the server's behavior in low resource
situations. I prefer graceful degradation of service rather
than adding admin knobs, plus adding maybe some mechanism that
can report unruly clients so the server admin can deal with them.

The VFS will stop giving out struct file's after a certain point.
NFSv4 OPEN operations will return an error.

Some of the server's resource pools have shrinker callbacks.
Those pools will be reaped when there is memory pressure. We
could do better there.

I don't believe there's any mechanism yet to get rid of state
that is still active. I consider that as a "heroic measure"
that adds complexity for perhaps little benefit. And doing so
is potentially catastrophic for clients that trust the server
won't rip the rug out from under them. I would rather prevent
the creation of new state at that point rather than invalidating
existing state.

There are ways that server administrators can clean out known
defunct clients. That would be perhaps a little safer, and might
go along with the idea that server admins should deal mindfully
with malfunctioning or malicious clients that are causing a
denial of service.

In the end we have to address problems like CPU soft lock-up
to be certain server administrative interfaces will remain
available when the server is under a significant workload.

--
Chuck Lever