----- Original Message ----- > From: "Olga Kornievskaia" <aglo@xxxxxxxxx> > To: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx> > Cc: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx> > Sent: Friday, August 6, 2021 2:53:19 PM > Subject: Re: Callback slot table overflowed > On Thu, Aug 5, 2021 at 12:15 AM Timothy Pearson > <tpearson@xxxxxxxxxxxxxxxxxxxxx> wrote: >> >> On further investigation, the working server had already been rolled back to >> 4.19.0. Apparently the issue was insurmountable in 5.x. >> >> It should be simple enough to set up a test environment out of production for >> 5.x, if you have any debug tips / would like to see any debug options compiled >> in. >> >> Thanks! >> >> ----- Original Message ----- >> > From: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx> >> > To: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx> >> > Sent: Wednesday, August 4, 2021 7:04:16 PM >> > Subject: Re: Callback slot table overflowed >> >> > Other information that may be helpful: >> > >> > All clients are using TCP >> > arm64 clients are unaffected by the bug >> > The armel clients use very small (4k) rsize/wsize buffers >> > Prior to the upgrade from Debian Stretch, everything was working perfectly >> > >> > ----- Original Message ----- >> >> From: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx> >> >> To: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx> >> >> Sent: Wednesday, August 4, 2021 7:00:20 PM >> >> Subject: Callback slot table overflowed >> > >> >> All, >> >> >> >> We've hit an odd issue after upgrading a main NFS server from Debian Stretch to >> >> Debian Buster. In both cases the 5.13.4 kernel was used, however after the >> >> upgrade none of our ARM thin clients can mount their root filesystems -- early >> >> in the boot process I/O errors are returned immediately following "Callback >> >> slot table overflowed" in the client dmesg. >> >> >> >> I am unable to find any useful information on this "Callback slot table >> >> overflowed" message, and have no idea why it is only impacting our ARM (armel) >> >> clients. Both 4.14 and 5.3 on the client side show the issue, other client >> >> kernel versions were not tested. >> >> >> >> Curiously, increasing the rsize/wsize values to 65536 or higher reduces (but >> >> does not eliminate) the number of callback overflow messages. >> >> >> >> The server is a ppc64el 64k page host, and none of our pcc64el or amd64 thin >> >> clients are experiencing any problems. Nothing of interest appears in the >> >> server message log. >> >> >> >> Any troubleshooting hints would be most welcome. > > A network trace would be useful. > > 5.3 should have this patch "SUNRPC: Fix up backchannel slot table > accounting". I believe "callback slot table overflowed" is hit when > the server sent more reqs than client can handle (ie doesn't have a > free slot to handle the request). A network trace would show that. > However you said this happens when the client is trying to mount and > besides cb_null requests I'm not sure what could be happening. I'll work to get a network trace out of the test environment once it's set up. I should however clarify that this is immediately *after* mount, when the diskless ARM device is attempting to run early startup (i.e. reading /etc/init.d and such). >> >> > > > > Thank you!