Re: filecache LRU performance regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Jun 1, 2022, at 12:10 PM, Frank van der Linden <fllinden@xxxxxxxxxx> wrote:
> 
> On Wed, Jun 01, 2022 at 12:34:34AM +0000, Chuck Lever III wrote:
>>> On May 27, 2022, at 5:34 PM, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
>>> 
>>> 
>>> 
>>>> On May 27, 2022, at 4:37 PM, Frank van der Linden <fllinden@xxxxxxxxxx> wrote:
>>>> 
>>>> On Fri, May 27, 2022 at 06:59:47PM +0000, Chuck Lever III wrote:
>>>>> 
>>>>> 
>>>>> Hi Frank-
>>>>> 
>>>>> Bruce recently reminded me about this issue. Is there a bugzilla somewhere?
>>>>> Do you have a reproducer I can try?
>>>> 
>>>> Hi Chuck,
>>>> 
>>>> The easiest way to reproduce the issue is to run generic/531 over an
>>>> NFSv4 mount, using a system with a larger number of CPUs on the client
>>>> side (or just scaling the test up manually - it has a calculation based
>>>> on the number of CPUs).
>>>> 
>>>> The test will take a long time to finish. I initially described the
>>>> details here:
>>>> 
>>>> https://lore.kernel.org/linux-nfs/20200608192122.GA19171@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
>>>> 
>>>> Since then, it was also reported here:
>>>> 
>>>> https://lore.kernel.org/all/20210531125948.2D37.409509F4@xxxxxxxxxxxx/T/#m8c3e4173696e17a9d5903d2a619550f352314d20
>>> 
>>> Thanks for the summary. So, there isn't a bugzilla tracking this
>>> issue? If not, please create one here:
>>> 
>>> https://bugzilla.linux-nfs.org/
>>> 
>>> Then we don't have to keep asking for a repeat summary ;-)
>> 
>> I can easily reproduce this scenario in my lab. I've opened:
>> 
>>  https://bugzilla.linux-nfs.org/show_bug.cgi?id=386
>> 
> 
> Thanks for taking care of that. I'm switching jobs, so I won't have much
> time to look at it or test for a few weeks.

No problem. I can reproduce the failure, and I have some ideas
of how to address the issue, so I've assigned the bug to myself.


> I think the basic problem is that the filecache is a clear win for
> v3, since that's stateless and it avoids a lookup for each operation.
> 
> For v4, it's not clear to me that it's much of a win, and in this case
> it definitely gets in the way.
> 
> Maybe the best thing is to not bother at all with the caching for v4,

At this point I don't think we can go that way. The NFSv4 code
uses a lot of the same infrastructural helpers as NFSv3, and
all of those now depend on the use of nfsd_file objects.

Certainly, though, the filecache plays somewhat different roles
for legacy NFS and NFSv4. I've been toying with the idea of
maintaining separate filecaches for NFSv3 and NFSv4, since
the garbage collection and shrinker rules are fundamentally
different for the two, and NFSv4 wants a file closed completely
(no lingering open) when it does a CLOSE or DELEGRETURN.

In the meantime, the obvious culprit is the LRU walk during
garbage collection is broken. I've talked with Dave Chinner,
co-author of list_lru, about a way to straighten this out so
that the LRU walk is very nicely bounded and at the same time
deals properly with NFSv4 OPEN and CLOSE. Trond also had an
idea or two here, and it seems the three of us are on nearly
the same page.

Once that is addressed, we can revisit Wang's suggestion of
serializing garbage collection, as a nice optimization.

Good luck with your new position!


> although that might hurt mixed v3/v4 clients accessing the same fs
> slightly. Not sure how common of a scenario that is, though.



--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux