Re: nconnect & repeating BIND_CONN_TO_SESSION?

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Mon, 7 Feb 2022 10:53:08 -0500

The server enforces a limit on the total number of connections in
net/sunrpc/svc.c:svc_check_conn_limits().  Maybe that's what you're
hitting.

By default it's (number of threads + 3) * 20.  You can bump the number
of nfsd threads or change /proc/fs/nfsd/max_connections.

Weird that your limit would be 80, though, which is the number you'd
expect if the server was running with just one thread.

The only other rpc server I can think of that's involved here is the NFS
client's callback server, which does have only one thread, but
nfs_callback_create_svc() does:

	/* As there is only one thread we need to over-ride the default
	 * maximum of 80 connections
	 */
	serv->sv_maxconn = 1024;

and has since the beginning.  I can't see why that wouldn't work.  If
80's really your limit, though, that seems like an odd coincidence.
Have you seen that "too many connections" warning in the client logs?

--b.

On Mon, Feb 07, 2022 at 03:21:41PM +0000, Daire Byrne wrote:
> Trond kindly posted a patch to fix the noresvport mount issue with
> v4.2 and recent kernels.
> 
> I tested it quickly and verified ports greater than 1024 were being
> used as expected, but it seems the same issue persists. It still feels
> like it's related to the total number of server + nconnect pairings.
> 
> So I can have 20 servers mounted with nconnect=4 or 10 servers mounted
> with nconnect=8 but any combination that increases the total
> connection on the client past that and at least one of the servers
> ends up in a state such that it's just sending a bind_conn_to_session
> with every operation.
> 
> I'll see if I can discern anything from any packet capture (as
> suggested earlier by Rick), but it's hard to reproduce exactly in time
> and on demand. My theory is that maybe there is a timeout on the
> callback and that adding more connections is just adding more
> load/throughput and making a timeout more likely.
> 
> My workaround atm is to simply use NFSv3 instead of NFSv4 which might
> be a better choice for this kind of workload anyway.
> 
> Daire
> 
> 
> On Mon, 24 Jan 2022 at 12:33, Daire Byrne <daire@xxxxxxxx> wrote:
> >
> > On Sun, 23 Jan 2022 at 22:42, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> > > > I suspect it's just more recent kernels that has lost the ability to
> > > > use v4+noresvport
> > >
> > > Yes, thanks for checking that.  Let us know if you narrow down the
> > > kernel any more.
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=215526
> >
> > I think it stopped working somewhere between v5.11 and v5.12. I'll try
> > and bisect it this week.
> >
> > Daire