Re: Callback slot table overflowed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 5, 2021 at 12:15 AM Timothy Pearson
<tpearson@xxxxxxxxxxxxxxxxxxxxx> wrote:
>
> On further investigation, the working server had already been rolled back to 4.19.0.  Apparently the issue was insurmountable in 5.x.
>
> It should be simple enough to set up a test environment out of production for 5.x, if you have any debug tips / would like to see any debug options compiled in.
>
> Thanks!
>
> ----- Original Message -----
> > From: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx>
> > To: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx>
> > Sent: Wednesday, August 4, 2021 7:04:16 PM
> > Subject: Re: Callback slot table overflowed
>
> > Other information that may be helpful:
> >
> > All clients are using TCP
> > arm64 clients are unaffected by the bug
> > The armel clients use very small (4k) rsize/wsize buffers
> > Prior to the upgrade from Debian Stretch, everything was working perfectly
> >
> > ----- Original Message -----
> >> From: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx>
> >> To: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx>
> >> Sent: Wednesday, August 4, 2021 7:00:20 PM
> >> Subject: Callback slot table overflowed
> >
> >> All,
> >>
> >> We've hit an odd issue after upgrading a main NFS server from Debian Stretch to
> >> Debian Buster.  In both cases the 5.13.4 kernel was used, however after the
> >> upgrade none of our ARM thin clients can mount their root filesystems -- early
> >> in the boot process I/O errors are returned immediately following "Callback
> >> slot table overflowed" in the client dmesg.
> >>
> >> I am unable to find any useful information on this "Callback slot table
> >> overflowed" message, and have no idea why it is only impacting our ARM (armel)
> >> clients.  Both 4.14 and 5.3 on the client side show the issue, other client
> >> kernel versions were not tested.
> >>
> >> Curiously, increasing the rsize/wsize values to 65536 or higher reduces (but
> >> does not eliminate) the number of callback overflow messages.
> >>
> >> The server is a ppc64el 64k page host, and none of our pcc64el or amd64 thin
> >> clients are experiencing any problems.  Nothing of interest appears in the
> >> server message log.
> >>
> >> Any troubleshooting hints would be most welcome.

A network trace would be useful.

5.3 should have this patch "SUNRPC: Fix up backchannel slot table
accounting". I believe "callback slot table overflowed" is hit when
the server sent more reqs than client can handle (ie doesn't have a
free slot to handle the request). A network trace would show that.
However you said this happens when the client is trying to mount and
besides cb_null requests I'm not sure what could be happening.

> >>
> > > Thank you!



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux