Re: Callback slot table overflowed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----
> From: "Olga Kornievskaia" <aglo@xxxxxxxxx>
> To: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx>
> Cc: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx>
> Sent: Friday, August 6, 2021 2:53:19 PM
> Subject: Re: Callback slot table overflowed

> On Thu, Aug 5, 2021 at 12:15 AM Timothy Pearson
> <tpearson@xxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> On further investigation, the working server had already been rolled back to
>> 4.19.0.  Apparently the issue was insurmountable in 5.x.
>>
>> It should be simple enough to set up a test environment out of production for
>> 5.x, if you have any debug tips / would like to see any debug options compiled
>> in.
>>
>> Thanks!
>>
>> ----- Original Message -----
>> > From: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx>
>> > To: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx>
>> > Sent: Wednesday, August 4, 2021 7:04:16 PM
>> > Subject: Re: Callback slot table overflowed
>>
>> > Other information that may be helpful:
>> >
>> > All clients are using TCP
>> > arm64 clients are unaffected by the bug
>> > The armel clients use very small (4k) rsize/wsize buffers
>> > Prior to the upgrade from Debian Stretch, everything was working perfectly
>> >
>> > ----- Original Message -----
>> >> From: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx>
>> >> To: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx>
>> >> Sent: Wednesday, August 4, 2021 7:00:20 PM
>> >> Subject: Callback slot table overflowed
>> >
>> >> All,
>> >>
>> >> We've hit an odd issue after upgrading a main NFS server from Debian Stretch to
>> >> Debian Buster.  In both cases the 5.13.4 kernel was used, however after the
>> >> upgrade none of our ARM thin clients can mount their root filesystems -- early
>> >> in the boot process I/O errors are returned immediately following "Callback
>> >> slot table overflowed" in the client dmesg.
>> >>
>> >> I am unable to find any useful information on this "Callback slot table
>> >> overflowed" message, and have no idea why it is only impacting our ARM (armel)
>> >> clients.  Both 4.14 and 5.3 on the client side show the issue, other client
>> >> kernel versions were not tested.
>> >>
>> >> Curiously, increasing the rsize/wsize values to 65536 or higher reduces (but
>> >> does not eliminate) the number of callback overflow messages.
>> >>
>> >> The server is a ppc64el 64k page host, and none of our pcc64el or amd64 thin
>> >> clients are experiencing any problems.  Nothing of interest appears in the
>> >> server message log.
>> >>
>> >> Any troubleshooting hints would be most welcome.
> 
> A network trace would be useful.
> 
> 5.3 should have this patch "SUNRPC: Fix up backchannel slot table
> accounting". I believe "callback slot table overflowed" is hit when
> the server sent more reqs than client can handle (ie doesn't have a
> free slot to handle the request). A network trace would show that.
> However you said this happens when the client is trying to mount and
> besides cb_null requests I'm not sure what could be happening.

I'll work to get a network trace out of the test environment once it's set up.  I should however clarify that this is immediately *after* mount, when the diskless ARM device is attempting to run early startup (i.e. reading /etc/init.d and such).

>> >>
> > > > Thank you!



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux