Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[[Mel: if you read through to the end you'll see why I cc:ed you on this]]

On Fri, 27 Aug 2021, Mike Javorski wrote:
> I just tried the same mount with 4 different nfsvers values: 3, 4.0, 4.1 and 4.2
> 
> At first I thought it might be "working" because I only got freezes
> with 4.2 at first, but I went back and re-tested (to be sure) and got
> freezes with all 4 versions. So the nfsvers setting doesn't seem to
> have an impact. I did verify at each pass that the 'nfsvers=' value
> was present and correct in the mount output.
> 
> FYI: another user posted on the archlinux reddit with a similar issue,
> I suggested they try with a 5.12 kernel and that "solved" the issue
> for them as well.

well... I have good news and I have bad news.

First the good.
I reviewed all the symptoms again, and browsed the commits between
working and not-working, and the only pattern that made any sense was
that there was some issue with memory allocation.  The pauses - I
reasoned - were most likely pauses while allocating memory.

So instead of testing in a VM with 2G of RAM, I tried 512MB, and
suddenly the problem was trivial to reproduce.  Specifically I created a
(sparse) 1GB file on the test VM, exported it over NFS, and ran "md5sum"
on the file from an NFS client.  With 5.12 this reliably takes about 90 seconds
(as it does with 2G RAM).  On 5.13 and 512MB RAM, it usually takes a lot
longer.  5, 6, 7, 8 minutes (and assorted seconds).

The most questionable nfsd/ memory related patch in 5.13 is

 Commit f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")

I reverted that and now the problem is no longer there.  Gone.  90seconds
every time.

Now the bad news: I don't know why.  That patch should be a good patch,
with a small performance improvement, particularly at very high loads.
(maybe even a big improvement at very high loads).
The problem must be in alloc_pages_bulk_array(), which is a new
interface, so not possible to bisect.

So I might have a look at the code next week, but I've cc:ed Mel Gorman
in case he comes up with some ideas sooner.

For now, you can just revert that patch.

Thanks for all the testing you did!!  It certainly helped.

NeilBrown





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux