NFS Kernel Panics

Peter Thurner <p.thurner@xxxxxxxxxx> · Mon, 30 Nov 2015 17:20:23 +0100

Hi guys,

I'm running the following Setup on Ubuntu 14.04 for both Server and Clients:

== NFS Server with /etc/exports:

/var/www/ 172.16.1.254(rw,no_root_squash,sync,no_subtree_check)
172.16.1.184(rw,no_root_squash,sync,no_subtree_check)
172.16.0.120(rw,no_root_squash,sync,no_subtree_check)
172.16.0.193(rw,no_root_squash,sync,no_subtree_check)

Version: 1:1.2.8-6ubuntu1.2

== Four NFS Clients with fstab:

alpha:/var/www        /var/www    nfs4   
nosharecache,fsc=example_web,noatime,tcp,bg,nosuid,rsize=32768,wsize=32768,soft,proto=tcp   
0 0

On the Clients i'm using cachefilesd:

/var/cache/cachefilesd/loopimage.img       
/var/cache/cachefilesd/srv    ext4   
loop,rw,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered    0
0

root@web1:~# cat /etc/cachefilesd.conf
dir /var/cache/cachefilesd/srv
tag nfs_filesystem_cache
brun 20%
frun 10%
bcull 10%
fcull 7%
bstop 5%
fstop 3%

== Problem

Both server and clients experience random kernel Panics. Of the five
machines, around one dies per die. They all run on Amazon AWS as
m4.large instances. When I set

rpcdebug -m nfsd -s all
rpcdebug -m rpc -s all

The messages before the crash (this time on the NFS server) are:

```
Nov 30 13:49:54 nfs-master kernel: [38232.649545] nfsd_dispatch: vers 4
proc 1
Nov 30 13:49:54 nfs-master kernel: [38232.649547] nfsv4 compound op
#1/3: 22 (OP_PUTFH)
Nov 30 13:49:54 nfs-master kernel: [38232.649548] nfsd: fh_verify(32:
81060001 0c7791ab ab46dd87 663ae28a 6877949f 2802898e)
Nov 30 13:49:54 nfs-master kernel: [38232.649552] nfsv4 compound op
ffff8802026c8080 opcnt 3 #1: 22: status 0
Nov 30 13:49:54 nfs-master kernel: [38232.649553] nfsv4 compound op
#2/3: 4 (OP_CLOSE)
Nov 30 13:49:54 nfs-master kernel: [38232.649554] NFSD: nfsd4_close on
file objectLinksShadow.png
Nov 30 13:49:54 nfs-master kernel: [38232.649556] NFSD:
nfs4_preprocess_seqid_op: seqid=818421 stateid =
(565bb0a0/00000001/00083f05/00000001)
Nov 30 13:49:54 nfs-master kernel: [38232.649557] renewing client
(clientid 565bb0a0/00000001)
Nov 30 13:49:54 nfs-master kernel: [38232.649558] NFSD:
move_to_close_lru nfs4_openowner ffff8800373b8000
Nov 30 13:49:54 nfs-master kernel: [38232.649559] nfsv4 compound op
ffff8802026c8080 opcnt 3 #2: 4: status 0
Nov 30 13:49:54 nfs-master kernel: [38232.649560] nfsv4 compound op
#3/3: 9 (OP_GETATTR)
Nov 30 13:49:54 nfs-master kernel: [38232.649562] nfsd: fh_verify(32:
81060001 0c7791ab ab46dd87 663ae28a 6877949f 2802898e)
Nov 30 13:49:54 nfs-master kernel: [38232.649564] nfsv4 compound op
ffff8802026c8080 opcnt 3 #3: 9: status 0
Nov 30 13:49:54 nfs-master kernel: [38232.649565] nfsv4 compound returned 0
Nov 30 13:49:54 nfs-master kernel: [38232.649570] svc: socket
ffff8800e929d000 sendto([ffff8801e07ae000 136... ], 136) = 136 (addr
172.16.0.120, port=958)
Nov 30 13:49:54 nfs-master kernel: [38232.649571] svc: server
ffff880202142000 waiting for data (to = 900000)
Nov 30 13:49:54 nfs-master rsyslogd: [origin software="rsyslogd"
swVersion="7.4.4" x-pid="939" x-info="http://www.rsyslog.com";] exiting
on signal 15.

Server is rebooting here

Nov 30 13:50:34 nfs-master rsyslogd: [origin software="rsyslogd"
swVersion="7.4.4" x-pid="951" x-info="http://www.rsyslog.com";] start
Nov 30 13:50:34 nfs-master rsyslogd-2307: warning: ~ action is
deprecated, consider using the 'stop' statement instead [try
http://www.rsyslog.com/e/2307 ]
Nov 30 13:50:34 nfs-master rsyslogd: rsyslogd's groupid changed to 104
Nov 30 13:50:34 nfs-master rsyslogd: rsyslogd's userid changed to 101
Nov 30 13:50:34 nfs-master kernel: [    0.000000] Initializing cgroup
subsys cpuset
Nov 30 13:50:34 nfs-master kernel: [    0.000000] Initializing cgroup
subsys cpu
Nov 30 13:50:34 nfs-master kernel: [    0.000000] Initializing cgroup
subsys cpuacct

```

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html