On Thu, Aug 5, 2021 at 12:15 AM Timothy Pearson <tpearson@xxxxxxxxxxxxxxxxxxxxx> wrote: > > On further investigation, the working server had already been rolled back to 4.19.0. Apparently the issue was insurmountable in 5.x. > > It should be simple enough to set up a test environment out of production for 5.x, if you have any debug tips / would like to see any debug options compiled in. > > Thanks! > > ----- Original Message ----- > > From: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx> > > To: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx> > > Sent: Wednesday, August 4, 2021 7:04:16 PM > > Subject: Re: Callback slot table overflowed > > > Other information that may be helpful: > > > > All clients are using TCP > > arm64 clients are unaffected by the bug > > The armel clients use very small (4k) rsize/wsize buffers > > Prior to the upgrade from Debian Stretch, everything was working perfectly > > > > ----- Original Message ----- > >> From: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx> > >> To: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx> > >> Sent: Wednesday, August 4, 2021 7:00:20 PM > >> Subject: Callback slot table overflowed > > > >> All, > >> > >> We've hit an odd issue after upgrading a main NFS server from Debian Stretch to > >> Debian Buster. In both cases the 5.13.4 kernel was used, however after the > >> upgrade none of our ARM thin clients can mount their root filesystems -- early > >> in the boot process I/O errors are returned immediately following "Callback > >> slot table overflowed" in the client dmesg. > >> > >> I am unable to find any useful information on this "Callback slot table > >> overflowed" message, and have no idea why it is only impacting our ARM (armel) > >> clients. Both 4.14 and 5.3 on the client side show the issue, other client > >> kernel versions were not tested. > >> > >> Curiously, increasing the rsize/wsize values to 65536 or higher reduces (but > >> does not eliminate) the number of callback overflow messages. > >> > >> The server is a ppc64el 64k page host, and none of our pcc64el or amd64 thin > >> clients are experiencing any problems. Nothing of interest appears in the > >> server message log. > >> > >> Any troubleshooting hints would be most welcome. A network trace would be useful. 5.3 should have this patch "SUNRPC: Fix up backchannel slot table accounting". I believe "callback slot table overflowed" is hit when the server sent more reqs than client can handle (ie doesn't have a free slot to handle the request). A network trace would show that. However you said this happens when the client is trying to mount and besides cb_null requests I'm not sure what could be happening. > >> > > > Thank you!