> -----Original Message----- > From: linux-nfs-owner@xxxxxxxxxxxxxxx [mailto:linux-nfs- > owner@xxxxxxxxxxxxxxx] On Behalf Of Simon Kirby > Sent: Wednesday, March 06, 2013 4:52 AM > To: linux-nfs@xxxxxxxxxxxxxxx > Subject: NFSv3 TCP socket stuck when all slots used and server goes away > > We had an issue with an Pacemaker/CRM HA-NFSv3 setup where one > particular export hit an XFS locking issue on one node and got completely > stuck. > Upon failing over, service recovered for all clients that hadn't hit the mount > since the issue occurred, but almost all of the usual clients (which also statfs > commonly as a monitoring check) sat forever (>20 > minutes) without reconnecting. > > It seems that the clients filled the RPC slots with requests over the TCP > socket to the NFS VIP and the server ack'd everything at the TCP layer, but > was not able to reply to anything due to the FS locking issue. When we failed > over the VIP to the other node, service was restored, but the clients stuck > this way continued to sit with nothing to tickle the TCP layer. netstat shows a > socket with no send-queue, in ESTABLISHED state, and with no timer > enabled: > > tcp 0 0 c:724 s:2049 ESTABLISHED - off (0.00/0/0) > > The mountpoint options used are: rw,hard,intr,tcp,vers=3 > > The export options are: > rw,async,hide,no_root_squash,no_subtree_check,mp > > Is this expected behaviour? I suspect if TCP keepalived were enabled, the > socket would eventually get torn down as soon as the client tries to send > something to the (effectively rebooted / swapped) NFS server and gets an > RST. However, as-is, there seems to be nothing here that would eventually > cause anything to happen. Am I missing something? Which client? Did the server close the connection? Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html