On Mon, Apr 14, 2008 at 11:48:33PM -0500, Tom Tucker wrote: > > Maybe this this is a TCP_BACKLOG issue? So, looking around.... There seems to be a global limit in /proc/sys/net/ipv4/tcp_max_syn_backlog (default 1024?); might be worth seeing what happens if that's increased, e.g., with echo 2048 >/proc/sys/net/ipv4/tcp_max_syn_backlog Though each client does have to make more than one tcp connection, I wouldn't expect it to be making more than one at a time, so with 1340 clients, and assuming the requests are spread out at least a tiny bit, I would have thought 1024 would be enough. Oh, but: Grepping the glibc rpc code, it looks like it calls listen with second argument SOMAXCONN == 128. You can confirm that by strace'ing rpc.mountd -F and looking for the listen call. And that socket's shared between all the mountd processes, so I guess that's the real limit. I don't see an easy way to adjust that. You'd also need to increase /proc/sys/net/core/somaxconn first. But none of this explains why we'd see connections stuck in CLOSE_WAIT indefinitely? --b. > > BTW, with that many mounts won't you run out of "secure" ports (< 1024), > so you'll need to use 'insecure' as a mount option. > > > On Fri, 2008-04-11 at 19:07 -0400, J. Bruce Fields wrote: > > On Thu, Apr 10, 2008 at 02:12:58PM +0200, Carsten Aulbert wrote: > > > Hi all, > > > > > > we have a pretty extreme problem here and I try to figure out how to get > > > it done right. > > > > > > We have a large cluster consisting of 1340 compute nodes who have a > > > automount directory which will subsequently trigger a NFS mount (read-only): > > > > > > $ ypcat auto.data > > > -fstype=nfs,nfsvers=3,hard,intr,rsize=8192,wsize=8192,tcp &:/data > > > > > > $ grep auto.data /etc/auto.master > > > /atlas/data yp:auto.data --timeout=5 > > > > > > So far so good. > > > > > > When submitting 1000 jobs just doing a md5sum of the very same file from > > > one single data server, I see very weird effects. > > > > > > In the standard set-up many connections get into the box (tcp connection > > > status SYN_RECV) but those fall over after some time and stay in > > > CLOSE_WAIT state until I restart the nfs-kernel-server. Typically that > > > looks like (netstat -an): > > > > That's interesting! But I'm not sure how to figure this out. > > > > Is it possible to get a network trace that shows what's going on? > > > > What happens on the clients? > > > > What kernel version are you using?--b. > > > > > > > > tcp 0 0 10.20.10.14:687 10.10.2.87:799 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.4.1:823 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.1.65:656 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.1.30:650 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.0.71:789 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.1.4:602 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.1.1:967 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.3.66:915 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.0.55:620 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.1.41:835 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.2.29:958 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.1.12:998 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.1.30:651 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.1.4:601 SYN_RECV > > > tcp 0 0 10.20.10.14:2049 10.10.1.19:846 > > > ESTABLISHED > > > tcp 45 0 10.20.10.14:687 10.10.0.68:979 > > > CLOSE_WAIT > > > tcp 45 0 10.20.10.14:687 10.10.3.83:680 > > > CLOSE_WAIT > > > tcp 89 0 10.20.10.14:687 10.10.0.79:604 > > > CLOSE_WAIT > > > tcp 0 0 10.20.10.14:2049 10.10.2.6:676 > > > ESTABLISHED > > > tcp 45 0 10.20.10.14:687 10.10.2.56:913 > > > CLOSE_WAIT > > > tcp 45 0 10.20.10.14:687 10.10.0.60:827 > > > CLOSE_WAIT > > > tcp 0 0 10.20.10.14:2049 10.10.3.55:778 > > > ESTABLISHED > > > tcp 45 0 10.20.10.14:687 10.10.2.86:981 > > > CLOSE_WAIT > > > tcp 45 0 10.20.10.14:687 10.10.9.13:792 > > > CLOSE_WAIT > > > tcp 89 0 10.20.10.14:687 10.10.2.93:728 > > > CLOSE_WAIT > > > tcp 45 0 10.20.10.14:687 10.10.0.20:742 > > > CLOSE_WAIT > > > tcp 45 0 10.20.10.14:687 10.10.3.44:982 > > > CLOSE_WAIT > > > > > > > > > I played with different numbers of of nfsd (ranging from 8-1024) and > > > increasing the number of threads for rpc.mountd from 1 to 64, in quite a > > > few combinations, but so far I have not found a consistent set of > > > parameters where 1000 nodes are able to read this file at the same time. > > > > > > Any ideas from anyone or do you need more input from me? > > > > > > TIA > > > > > > Carsten > > > > > > PS: Please Cc me, I'm not yet subscribed. > > > > > > ------------------------------------------------------------------------- > > > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > > > Don't miss this year's exciting event. There's still time to save $100. > > > Use priority code J8TL2D2. > > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > > _______________________________________________ > > > NFS maillist - NFS@xxxxxxxxxxxxxxxxxxxxx > > > https://lists.sourceforge.net/lists/listinfo/nfs > > > _______________________________________________ > > > Please note that nfs@xxxxxxxxxxxxxxxxxxxxx is being discontinued. > > > Please subscribe to linux-nfs@xxxxxxxxxxxxxxx instead. > > > http://vger.kernel.org/vger-lists.html#linux-nfs > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > > Don't miss this year's exciting event. There's still time to save $100. > > Use priority code J8TL2D2. > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > _______________________________________________ > > NFS maillist - NFS@xxxxxxxxxxxxxxxxxxxxx > > https://lists.sourceforge.net/lists/listinfo/nfs > > _______________________________________________ > > Please note that nfs@xxxxxxxxxxxxxxxxxxxxx is being discontinued. > > Please subscribe to linux-nfs@xxxxxxxxxxxxxxx instead. > > http://vger.kernel.org/vger-lists.html#linux-nfs > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > NFS maillist - NFS@xxxxxxxxxxxxxxxxxxxxx > https://lists.sourceforge.net/lists/listinfo/nfs > _______________________________________________ > Please note that nfs@xxxxxxxxxxxxxxxxxxxxx is being discontinued. > Please subscribe to linux-nfs@xxxxxxxxxxxxxxx instead. > http://vger.kernel.org/vger-lists.html#linux-nfs > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ NFS maillist - NFS@xxxxxxxxxxxxxxxxxxxxx https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@xxxxxxxxxxxxxxxxxxxxx is being discontinued. Please subscribe to linux-nfs@xxxxxxxxxxxxxxx instead. http://vger.kernel.org/vger-lists.html#linux-nfs -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html