On Tue, 16 Nov 2010, J. Bruce Fields wrote: > On Mon, Nov 15, 2010 at 06:43:52PM +0000, Mark Hills wrote: > > I am looking into an issue of hanging clients to a set of NFS servers, on > > a large HPC cluster. > > > > My investigation took me to the RPC code, svc_create_socket(). > > > > if (protocol == IPPROTO_TCP) { > > if ((error = kernel_listen(sock, 64)) < 0) > > goto bummer; > > } > > > > A fixed backlog of 64 connections at the server seems like it could be too > > low on a cluster like this, particularly when the protocol opens and > > closes the TCP connection. > > > > I wondered what is the rationale is behind this number, particuarly as it > > is a fixed value. Perhaps there is a reason why this has no effect on > > nfsd, or is this a FAQ for people on large systems? > > > > The servers show overflow of a listening queue, which I imagine is > > related. > > > > $ netstat -s > > [...] > > TcpExt: > > 6475 times the listen queue of a socket overflowed > > 6475 SYNs to LISTEN sockets ignored > > > > The affected servers are old, kernel 2.6.9. But this limit of 64 is > > consistent across that and the latest kernel source. > > Looks like the last time that was touched was 8 years ago, by Neil (below, from > historical git archive). > > I'd be inclined to just keep doubling it until people don't complain, > unless it's very expensive. (How much memory (or whatever else) does a > pending connection tie up?) Perhaps SOMAXCONN could also be appropriate. > The clients should be retrying, though, shouldn't they? I think so, but a quick glance at net/sunrpc/clnt.c looks like the timeouts are fixed, not randomised. With nothing to smooth out the load from a large number of (identical) clients, potentially they could continue this process for some time. I may be in the wrong client code here though for a client TCP connection, perhaps someone with more experience can comment. I hope to investigate further tomorrow. -- Mark -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html