On Thu, Feb 21, 2019 at 12:35:46PM +0000, James Pearson wrote: > On Thu, 21 Feb 2019 at 04:18, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > > > On Wed, Feb 20, 2019 at 11:28:53AM +0000, James Pearson wrote: > > > On a very busy NFSv3 server (running CentOS 6), we recently upped the > > > nfsd thread count to 1024 - but this caused client mount requests over > > > UDP to fail. > > > > > > We configure all our clients to use TCP for NFS mounts, but the > > > automounter (automountd) on MacOS (up to version MacOS 10.12) seeds a > > > 'null call' to the NFS server over UDP before attempting the mount - > > > but the server appears to ignore any UDP requests - and the automount > > > fails > > > > By the way, you might also just turn off UDP. (Start run rpc.nfsd with > > the -U option.) Hopefully MacOS can handle that case. > > We tried that - but when we restarted nfs, some existing mounts hung > (not sure why, as we should be just using TCP everywhere) ... although > when tested on a test server, the MacOS automounter worked fine It's probably not a good idea to turn off UDP while there are existing mounts, even if the mounts are supposedly TCP. At a guess, maybe some one of the sideband protocols (NLM or NSM) is using UDP and that's causing problems. > I tried your patch - it doesn't apply 'as is' on a CentOS 6 kernel - > but with a bit of manual hacking, I can get it to fit Whoops, I missed at first that you were on an older kernel. > However, the net/sunrpc/svcsock.c in these kernels has an extra call > to svc_sock_setbufsize() : > > /* Initialize the socket */ > if (sock->type == SOCK_DGRAM) > svc_udp_init(svsk, serv); > else { > /* initialise setting must have enough space to > * receive and respond to one request. > */ > svc_sock_setbufsize(svsk->sk_sock, 4 * serv->sv_max_mesg, > 4 * serv->sv_max_mesg); > svc_tcp_init(svsk, serv); > } > > I tried replacing that svc_sock_setbufsize() with: > > svc_sock_setbufsize(svsk, 4); > > but that just caused the whole machine to lock up shortly after > sunrpc.ko was loaded ... Looks like it's trying to dereference svsk->xpt_server before svc_tcp_init() has initialized it. > However, things seem to work fine if I call a copy of the original > svc_sock_setbufsize() at that point in the code with the original args > ... > > i.e. mounts over UDP (and MacOS automounts) now work with nfsd threads > over 1017 (I tried 2048 ... and it worked) OK, I think that's evidence enough that this overflow was the problem you were hitting, so I'll send that patch upstream. > Incidentally, I came across an old thread on this list that appears to > be related to this issue (well, it mentions a 1020 thread limit and > buffer size wraps in svc_sock_setbufsize() ???) : > > https://www.spinics.net/lists/linux-nfs/msg34927.html > > ... but I'm not sure what the result of that was (nor if it is > actually related to the issue here) ? Yeah, see https://www.spinics.net/lists/linux-nfs/msg34932.html. So, I knew about this problem and even made a patch before and then somehow dropped it. I'm not sure how that happened. Anyway, I have it queued up for 5.1 now, so that shouldn't happen again. --b.