Neil: I am actually compiling a 5.13.13 kernel with the patch that Chuck suggested earlier right now. I am doing the full compile matching the distro compile as I don't have a targeted kernel config ready to go (it's been years), and I want to test like for like anyway. It should be ready to install in the AM, my time, so I will test with that first tomorrow and see if it resolves the issue, if not, I will report back and then try your revert suggestion. On the issue of memory though, my server has 16GB of memory (and free currently shows ~1GB unused, and ~11GB in buffers/caches), so this really shouldn't be an available memory issue, but I guess we'll find out. Thanks for the info. - mike On Thu, Aug 26, 2021 at 10:27 PM NeilBrown <neilb@xxxxxxx> wrote: > > > [[Mel: if you read through to the end you'll see why I cc:ed you on this]] > > On Fri, 27 Aug 2021, Mike Javorski wrote: > > I just tried the same mount with 4 different nfsvers values: 3, 4.0, 4.1 and 4.2 > > > > At first I thought it might be "working" because I only got freezes > > with 4.2 at first, but I went back and re-tested (to be sure) and got > > freezes with all 4 versions. So the nfsvers setting doesn't seem to > > have an impact. I did verify at each pass that the 'nfsvers=' value > > was present and correct in the mount output. > > > > FYI: another user posted on the archlinux reddit with a similar issue, > > I suggested they try with a 5.12 kernel and that "solved" the issue > > for them as well. > > well... I have good news and I have bad news. > > First the good. > I reviewed all the symptoms again, and browsed the commits between > working and not-working, and the only pattern that made any sense was > that there was some issue with memory allocation. The pauses - I > reasoned - were most likely pauses while allocating memory. > > So instead of testing in a VM with 2G of RAM, I tried 512MB, and > suddenly the problem was trivial to reproduce. Specifically I created a > (sparse) 1GB file on the test VM, exported it over NFS, and ran "md5sum" > on the file from an NFS client. With 5.12 this reliably takes about 90 seconds > (as it does with 2G RAM). On 5.13 and 512MB RAM, it usually takes a lot > longer. 5, 6, 7, 8 minutes (and assorted seconds). > > The most questionable nfsd/ memory related patch in 5.13 is > > Commit f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator") > > I reverted that and now the problem is no longer there. Gone. 90seconds > every time. > > Now the bad news: I don't know why. That patch should be a good patch, > with a small performance improvement, particularly at very high loads. > (maybe even a big improvement at very high loads). > The problem must be in alloc_pages_bulk_array(), which is a new > interface, so not possible to bisect. > > So I might have a look at the code next week, but I've cc:ed Mel Gorman > in case he comes up with some ideas sooner. > > For now, you can just revert that patch. > > Thanks for all the testing you did!! It certainly helped. > > NeilBrown >