Re: NFS Server Not Responding after hw change (svc: transport busy, not enqueued)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 05, 2010 at 10:25:34PM -0500, Scott Sturdivant wrote:
> I'm not sure what level of detail is appropriate here, so I apologize in  
> advance.
>
> This past weekend I swapped some hardware on my NFS server.  I swapped in 
> a new motherboard, processor, ram, and am now using the on-board LAN.  My 
> hard drives did not change, and upon booting up everything seemed to be  
> working just fine.  The problems start coming when my clients mount a  
> share and attempt to access a file.  The server is running Ubuntu 9.10  
> 32-bit server edition.  Uname -a: Linux blargh-server  
> 2.6.31-16-generic-pae #53-Ubuntu SMP Tue Dec 8 05:20:21 UTC 2009 i686  
> GNU/Linux The 1:1.2.0-2ubuntu8 nfs-kernel-server package is installed.

On a quick skim I don't see an obvious reason; one approach (if you're
*positive* there weren't also any software changes) might be just to try
swapping the hardware back (starting with the LAN?) and see if you can
reliably turn the problem on/off with just one hardware change.

--b.

>
> On the clients, I can mount the shares with "mount -t nfs  
> file-server:/home/scott/Videos/ ~/Videos".  The server's dmesg shows "Jan 
> 5 07:49:23 file-server mountd[1606]: authenticated mount request from  
> 192.168.1.100:802 for /home/scott/Videos/ (/home/scott/Videos/)" I can  
> then "ls" that directory and retrieve the directory listing.  But if I  
> access a file (cp ~/Videos/*.avi /tmp), only a portion of a single file  
> copies before the I/O will be blocked.  Eventually dmesg on the client  
> will give the following error:  nfs: server file-server not responding,  
> still trying
>
> At this point, executing 'rpcinfo -p file-server' from the client still  
> seems to indicate that NFS is running just fine on the server.
>
> (scott) file-client:~
> 507 -> rpcinfo -p file-server
>    program vers proto   port
>     100000    2   tcp    111  portmapper
>     100000    2   udp    111  portmapper
>     100024    1   udp  41238  status
>     100024    1   tcp  55833  status
>     100021    1   udp  38360  nlockmgr
>     100021    3   udp  38360  nlockmgr
>     100021    4   udp  38360  nlockmgr
>     100021    1   tcp  59774  nlockmgr
>     100021    3   tcp  59774  nlockmgr
>     100021    4   tcp  59774  nlockmgr
>     100003    2   udp   2049  nfs
>     100003    3   udp   2049  nfs
>     100003    4   udp   2049  nfs
>     100003    2   tcp   2049  nfs
>     100003    3   tcp   2049  nfs
>     100003    4   tcp   2049  nfs
>     100005    1   udp  42451  mountd
>     100005    1   tcp  57648  mountd
>     100005    2   udp  42451  mountd
>     100005    2   tcp  57648  mountd
>     100005    3   udp  42451  mountd
>     100005    3   tcp  57648  mountd
>
> As you can see though, the I/O is blocked.
>
> (scott) file-client:~
> 504 -> ps aux | grep " D"
> scott     4405  0.0  0.0   3428   920 pts/1    D+   08:04   0:00 cp  
> Videos/*.avi /tmp/
>
> On the server's end, I do not see any errors in dmesg or syslog or  
> messages.  That is until I increased the logging level using rpcdebug.  
> (Now I'm not sure if I did this correctly, but I did 'rpcdebug -m module  
> -s all' for all of the modules listed by rpcdebug -vh).
>
> In the below snippet from the server's dmesg, there are many svc:  
> transport %p busy, not enqueued messages:
>
> [ 6588.481185] nfsd_dispatch: vers 3 proc 6
> [ 6588.481211] nfsd: READ(3) 36: 01070001 0141401d 00000000 e12f98aa  
> 1c4965f0 0d4e5b93 131072 bytes at 22282240
> [ 6588.481231] nfsd: fh_verify(36: 01070001 0141401d 00000000 e12f98aa  
> 1c4965f0 0d4e5b93)
> [ 6588.481747] svc: socket f45f8e00 sendto([ed215000 132... ], 131204) =  
> 131204 (addr 192.168.1.100, port=915)
> [ 6588.481776] svc: socket f45f8e00 recvfrom(f45f8f70, 0) = 4
> [ 6588.481792] svc: TCP record, 156 bytes
> [ 6588.481821] svc: server f6ccd000 waiting for data (to = 900000)
> [ 6588.482701] svc: socket f45f8e00 sendto([ea53a000 132... ], 131204) =  
> 131204 (addr 192.168.1.100, port=915)
> [ 6588.482727] svc: socket f45f8e00 recvfrom(c7ab109c, 3940) = 156
> [ 6588.482732] svc: TCP complete record (156 bytes)
> [ 6588.482739] svc: transport f45f8e00 served by daemon f6ccd000
> [ 6588.482752] svc: transport f45f8e00 busy, not enqueued
> [ 6588.482766] svc: got len=156
> [ 6588.482781] svc: server f6cca000 waiting for data (to = 900000)
> [ 6588.482787] svc: svc_authenticate (1)
> [ 6588.482798] svc: calling dispatcher
> [ 6588.482806] nfsd_dispatch: vers 3 proc 6
> [ 6588.482831] nfsd: READ(3) 36: 01070001 0141401d 00000000 e12f98aa  
> 1c4965f0 0d4e5b93 131072 bytes at 22151168
> [ 6588.482854] svc: transport f45f8e00 busy, not enqueued
> [ 6588.482870] nfsd: fh_verify(36: 01070001 0141401d 00000000 e12f98aa  
> 1c4965f0 0d4e5b93)
> [ 6588.483499] svc: socket f45f8e00 sendto([cddbc000 132... ], 131204) =  
> 131204 (addr 192.168.1.100, port=915)
> [ 6588.483531] svc: transport f45f8e00 busy, not enqueued
> [ 6588.483543] svc: server de5f6000 waiting for data (to = 900000)
> [ 6588.483639] svc: socket f45f8e00 sendto([f4dbd000 132... ], 131204) =  
> 131204 (addr 192.168.1.100, port=915)
> [ 6588.483667] svc: transport f45f8e00 busy, not enqueued
> [ 6588.483674] svc: server f45d4000 waiting for data (to = 900000)
> [ 6588.483904] svc: socket f45f8e00 sendto([ea445000 132... ], 131204) =  
> 131204 (addr 192.168.1.100, port=915)
> [ 6588.483931] svc: transport f45f8e00 busy, not enqueued
> [ 6588.483937] svc: server de5f0000 waiting for data (to = 900000)
> [ 6588.483987] svc: server f6ccd000, pool 0, transport f45f8e00, inuse=2
> [ 6588.484004] svc: tcp_recv f45f8e00 data 1 conn 0 close 0
> [ 6588.484018] svc: socket f45f8e00 recvfrom(f45f8f70, 0) = 4
> [ 6588.484023] svc: TCP record, 156 bytes
> [ 6588.484036] svc: socket f45f8e00 recvfrom(cdc2f09c, 3940) = 156
>
> While I'm obviously suspect of the hardware being as that's what changed, 
> I can ssh to the server, scp large files between the two, and I can samba 
> share the same directories without any problems.  On the server I can 
> even mount an NFS share locally and manipulate the files just fine.  NFS 
> over the network seems to be the only thing giving me problems.
>
> Thanks for any help, and please let me know if there's more detail that I 
> can add to assist debugging.
>
> Scott
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux