CentOS5.4 x86_64 server with intermittent mount and read/write problems (BUG: soft lockup, debugging)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi All,

I have a nfs server that is under quite a bit of heavy usage from time to time, and today I have had lots of problems with some clients not being able to mount the export, and other servers reporting not responding: kernel: nfs: server someserver not responding, timed out, when reading/writing.

There are no error logs being produced on the server, and only these messages below on the client.

I presume this is a load issue, and I would be keen to identify which component is responsible so I can either upgrade or move some files elsewhere.

There appears to be a kernel upgrade available in the repos, and some have suggested setting the kernel option noapic and apic=off for the next boot, which I am going to try tomorrow)

Any suggestions on how to determine the choking point on this server?

Many Thanks
Tom


# uptime
04:59:17 up 97 days, 19:29,  2 users,  load average: 3.73, 3.47, 3.37

# free -m
            total       used       free     shared    buffers     cached
Mem:         12007      11944         63          0       1308       9645
-/+ buffers/cache:        990      11017
Swap:        12095          0      12095


On the clients I am seeing
# mount -vv -t nfs -o soft servername:/mount/processed34 /mount/processed34
mount: trying 198.nn.nn.nn prog 100003 vers 3 prot tcp port 2049
mount: mount to NFS server servername failed: timed out (retrying).
mount: trying 198.nn.nn.nn prog 100003 vers 3 prot tcp port 2049
mount: mount to NFS server 'servername' failed: timed out (giving up).

and for the clients that have mounted the fs;
kernel: nfs: server someserver not responding, timed out


I am seeing errors like this in the /var/log/messages;
# grep BUG messages
Jul 23 03:18:21 servername kernel: BUG: soft lockup - CPU#5 stuck for 11s! [nfsd:20540] Jul 23 03:22:09 servername kernel: BUG: soft lockup - CPU#1 stuck for 15s! [nfsd:20545]
(repeated many times)


retrans from a client that was having problems;
# nfsstat -c -3
Client rpc stats:
calls      retrans    authrefrsh
25536750 209 0
Client nfs v3:
null getattr setattr lookup access readlink 0 0% 1467584 6% 2702 0% 16868358 69% 746134 3% 0 0% read write create mkdir symlink mknod 337417 1% 683048 2% 24049 0% 91 0% 0 0% 0 0% remove rmdir rename link readdir readdirplus 10923 0% 86 0% 4844 0% 0 0% 158020 0% 3805702 15% fsstat fsinfo pathconf commit 156 0% 24 0% 0 0% 25625 0%

server nfs stats;
# nfsstat -s -3
Server rpc stats:
calls      badcalls   badauth    badclnt    xdrcall
342989209 1397 1397 0 0
Server nfs v3:
null getattr setattr lookup access readlink 4169 0% 1951757 0% 1241244 0% 68875972 20% 2180568 0% 0 0% read write create mkdir symlink mknod 18863 0% 144562881 42% 61482189 17% 456758 0% 0 0% 0 0% remove rmdir rename link readdir readdirplus 8119668 2% 35533 0% 1 0% 0 0% 46596 0% 1805329 0% fsstat fsinfo pathconf commit 545 0% 3379 0% 2 0% 51641657 15%







--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux