On Sat, Oct 14, 2017 at 09:59:49AM -0500, Ziemowit Pierzycki wrote: > Hi, > I have two NFS servers that appear to have the same issue. They're > both Fedora 25 based and none of the clients can connect while > retrying to infinity. If I restart the server it works for a little > before the same thing happening. > > Turning on debugging shows the following: > > [171565.851530] svc: socket ffff940c3a5ef000(inet ffff940d7db626c0), busy=1 > [171566.026535] svc: socket ffff940d7ac0c000(inet ffff940d7db87440), busy=1 > [171570.032880] svc: socket ffff940d9c253000(inet ffff940d999b9f00), busy=1 > [171576.915841] svc: socket ffff94143ce1d000(inet ffff940d7db62e80), busy=1 > [171578.360395] svc: socket ffff94128bba4000(inet ffff940d999b8f80), busy=1 > [171578.828178] svc: socket ffff94143919b000(inet ffff940d7db83640), busy=1 > [171578.828198] svc: socket ffff94143919b000(inet ffff940d7db83640), busy=1 > [171579.930641] svc: socket ffff940d89e71000(inet ffff940d999b8000), busy=1 > [171579.930662] svc: socket ffff940d89e71000(inet ffff940d999b8000), busy=1 > [171579.930680] svc: socket ffff940d89e71000(inet ffff940d999b8000), busy=1 > [171580.024655] svc: socket ffff940d9c253000(inet ffff940d999b9f00), busy=1 > [171580.913639] svc: socket ffff940d3f539000(inet ffff940d7db65d00), busy=1 > [171582.400198] NFSD: laundromat service - starting > [171582.400202] NFSD: laundromat_main - sleeping for 90 seconds > [171589.539121] svc: socket ffff940ac592d000(inet ffff940d97b55540), busy=1 > [171589.539284] svc: socket ffff940ac592d000(inet ffff940d97b55540), busy=1 > [171590.040366] svc: socket ffff940d9c253000(inet ffff940d999b9f00), busy=1 > [171590.591191] svc: socket ffff94128bba1000(inet ffff940d7db607c0), busy=1 > [171598.027702] svc: socket ffff94143919b000(inet ffff940d7db83640), busy=1 > [171599.863801] svc: socket ffff94128bba4000(inet ffff940d999b8f80), busy=1 > [171599.863836] svc: socket ffff94128bba4000(inet ffff940d999b8f80), busy=1 > [171600.056109] svc: socket ffff940d9c253000(inet ffff940d999b9f00), busy=1 > [171604.354706] svc: socket ffff940ac592d000(inet ffff940d97b55540), busy=1 > [171608.585185] svc: socket ffff94057a6da000(inet ffff940d999bdd00), busy=1 > [171609.498365] svc: socket ffff940c3a5ef000(inet ffff940d7db626c0), busy=1 > [171609.790704] svc: socket ffff94128bba1000(inet ffff940d7db607c0), busy=1 > [171610.071868] svc: socket ffff940d9c253000(inet ffff940d999b9f00), busy=1 > [171616.141902] svc: socket ffff940d7ac08000(inet ffff940d7db81f00), busy=1 > [171620.055620] svc: socket ffff940d9c253000(inet ffff940d999b9f00), busy=1 > > Then there is a single nfsd process that has a very high load: > > # cat /proc/4192/stack > [<ffffffffffffffff>] 0xffffffffffffffff Not sure what that means. A sysrq-t dump might help. (echo t>/proc/sysrq-trigger, then show us what's dumped to the logs.) --b. > > # rpcinfo > program version netid address service owner > 100000 4 tcp6 ::.0.111 portmapper superuser > 100000 3 tcp6 ::.0.111 portmapper superuser > 100000 4 udp6 ::.0.111 portmapper superuser > 100000 3 udp6 ::.0.111 portmapper superuser > 100000 4 tcp 0.0.0.0.0.111 portmapper superuser > 100000 3 tcp 0.0.0.0.0.111 portmapper superuser > 100000 2 tcp 0.0.0.0.0.111 portmapper superuser > 100000 4 udp 0.0.0.0.0.111 portmapper superuser > 100000 3 udp 0.0.0.0.0.111 portmapper superuser > 100000 2 udp 0.0.0.0.0.111 portmapper superuser > 100000 4 local /run/rpcbind.sock portmapper superuser > 100000 3 local /run/rpcbind.sock portmapper superuser > 100024 1 udp 0.0.0.0.131.70 status 29 > 100024 1 tcp 0.0.0.0.221.245 status 29 > 100024 1 udp6 ::.170.79 status 29 > 100024 1 tcp6 ::.143.15 status 29 > 100005 1 udp 0.0.0.0.78.80 mountd superuser > 100005 1 tcp 0.0.0.0.78.80 mountd superuser > 100005 1 udp6 ::.78.80 mountd superuser > 100005 1 tcp6 ::.78.80 mountd superuser > 100005 2 udp 0.0.0.0.78.80 mountd superuser > 100005 2 tcp 0.0.0.0.78.80 mountd superuser > 100005 2 udp6 ::.78.80 mountd superuser > 100005 2 tcp6 ::.78.80 mountd superuser > 100005 3 udp 0.0.0.0.78.80 mountd superuser > 100005 3 tcp 0.0.0.0.78.80 mountd superuser > 100005 3 udp6 ::.78.80 mountd superuser > 100005 3 tcp6 ::.78.80 mountd superuser > 100003 3 tcp 0.0.0.0.8.1 nfs superuser > 100003 4 tcp 0.0.0.0.8.1 nfs superuser > 100227 3 tcp 0.0.0.0.8.1 nfs_acl superuser > 100003 3 udp 0.0.0.0.8.1 nfs superuser > 100227 3 udp 0.0.0.0.8.1 nfs_acl superuser > 100003 3 tcp6 ::.8.1 nfs superuser > 100003 4 tcp6 ::.8.1 nfs superuser > 100227 3 tcp6 ::.8.1 nfs_acl superuser > 100003 3 udp6 ::.8.1 nfs superuser > 100227 3 udp6 ::.8.1 nfs_acl superuser > 100021 1 udp 0.0.0.0.231.220 nlockmgr superuser > 100021 3 udp 0.0.0.0.231.220 nlockmgr superuser > 100021 4 udp 0.0.0.0.231.220 nlockmgr superuser > 100021 1 tcp 0.0.0.0.145.133 nlockmgr superuser > 100021 3 tcp 0.0.0.0.145.133 nlockmgr superuser > 100021 4 tcp 0.0.0.0.145.133 nlockmgr superuser > 100021 1 udp6 ::.188.96 nlockmgr superuser > 100021 3 udp6 ::.188.96 nlockmgr superuser > 100021 4 udp6 ::.188.96 nlockmgr superuser > 100021 1 tcp6 ::.173.23 nlockmgr superuser > 100021 3 tcp6 ::.173.23 nlockmgr superuser > 100021 4 tcp6 ::.173.23 nlockmgr superuser > > And all the clients are trying to reconnect: > > nfs: server elkpinfnas03.corp.vibes.com OK > nfs: server elkpinfnas03.corp.vibes.com OK > nfs: server elkpinfnas03.corp.vibes.com not responding, still trying > > Any help would be greatly appreciated. Thank you. > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html