Re: NFS Kernel Panics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 30, 2015 at 05:20:23PM +0100, Peter Thurner wrote:
> Hi guys,
> 
> I'm running the following Setup on Ubuntu 14.04 for both Server and Clients:

I don't know what kernel version that translates to.

Ideally this would either get reported to Ubuntu, or reproduced with an
upstream kernel before getting reported here.

> 
> 
> == NFS Server with /etc/exports:
> 
> /var/www/ 172.16.1.254(rw,no_root_squash,sync,no_subtree_check)
> 172.16.1.184(rw,no_root_squash,sync,no_subtree_check)
> 172.16.0.120(rw,no_root_squash,sync,no_subtree_check)
> 172.16.0.193(rw,no_root_squash,sync,no_subtree_check)
> 
> Version: 1:1.2.8-6ubuntu1.2
> 
> 
> == Four NFS Clients with fstab:
> 
> alpha:/var/www        /var/www    nfs4   
> nosharecache,fsc=example_web,noatime,tcp,bg,nosuid,rsize=32768,wsize=32768,soft,proto=tcp   
> 0 0
> 
> On the Clients i'm using cachefilesd:
> 
> /var/cache/cachefilesd/loopimage.img       
> /var/cache/cachefilesd/srv    ext4   
> loop,rw,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered    0
> 0
> 
> root@web1:~# cat /etc/cachefilesd.conf
> dir /var/cache/cachefilesd/srv
> tag nfs_filesystem_cache
> brun 20%
> frun 10%
> bcull 10%
> fcull 7%
> bstop 5%
> fstop 3%
> 
> 
> == Problem
> 
> Both server and clients experience random kernel Panics. Of the five
> machines, around one dies per die.

per day?

> They all run on Amazon AWS as
> m4.large instances. When I set
> 
> rpcdebug -m nfsd -s all
> rpcdebug -m rpc -s all
> 
> The messages before the crash (this time on the NFS server) are:
> 
> ```
> Nov 30 13:49:54 nfs-master kernel: [38232.649545] nfsd_dispatch: vers 4
> proc 1
> Nov 30 13:49:54 nfs-master kernel: [38232.649547] nfsv4 compound op
> #1/3: 22 (OP_PUTFH)
> Nov 30 13:49:54 nfs-master kernel: [38232.649548] nfsd: fh_verify(32:
> 81060001 0c7791ab ab46dd87 663ae28a 6877949f 2802898e)
> Nov 30 13:49:54 nfs-master kernel: [38232.649552] nfsv4 compound op
> ffff8802026c8080 opcnt 3 #1: 22: status 0
> Nov 30 13:49:54 nfs-master kernel: [38232.649553] nfsv4 compound op
> #2/3: 4 (OP_CLOSE)
> Nov 30 13:49:54 nfs-master kernel: [38232.649554] NFSD: nfsd4_close on
> file objectLinksShadow.png
> Nov 30 13:49:54 nfs-master kernel: [38232.649556] NFSD:
> nfs4_preprocess_seqid_op: seqid=818421 stateid =
> (565bb0a0/00000001/00083f05/00000001)
> Nov 30 13:49:54 nfs-master kernel: [38232.649557] renewing client
> (clientid 565bb0a0/00000001)
> Nov 30 13:49:54 nfs-master kernel: [38232.649558] NFSD:
> move_to_close_lru nfs4_openowner ffff8800373b8000
> Nov 30 13:49:54 nfs-master kernel: [38232.649559] nfsv4 compound op
> ffff8802026c8080 opcnt 3 #2: 4: status 0
> Nov 30 13:49:54 nfs-master kernel: [38232.649560] nfsv4 compound op
> #3/3: 9 (OP_GETATTR)
> Nov 30 13:49:54 nfs-master kernel: [38232.649562] nfsd: fh_verify(32:
> 81060001 0c7791ab ab46dd87 663ae28a 6877949f 2802898e)
> Nov 30 13:49:54 nfs-master kernel: [38232.649564] nfsv4 compound op
> ffff8802026c8080 opcnt 3 #3: 9: status 0
> Nov 30 13:49:54 nfs-master kernel: [38232.649565] nfsv4 compound returned 0
> Nov 30 13:49:54 nfs-master kernel: [38232.649570] svc: socket
> ffff8800e929d000 sendto([ffff8801e07ae000 136... ], 136) = 136 (addr
> 172.16.0.120, port=958)
> Nov 30 13:49:54 nfs-master kernel: [38232.649571] svc: server
> ffff880202142000 waiting for data (to = 900000)

This all looks pretty normal to me.

> Nov 30 13:49:54 nfs-master rsyslogd: [origin software="rsyslogd"
> swVersion="7.4.4" x-pid="939" x-info="http://www.rsyslog.com";] exiting
> on signal 15.

That's SIGTERM.  No idea if that means anything.

Sorry, I don't see anything much to go on here.  Is there a console that
might have anything more?  I'm not very familiar with AWS.

--b.

> Server is rebooting here
> 
> Nov 30 13:50:34 nfs-master rsyslogd: [origin software="rsyslogd"
> swVersion="7.4.4" x-pid="951" x-info="http://www.rsyslog.com";] start
> Nov 30 13:50:34 nfs-master rsyslogd-2307: warning: ~ action is
> deprecated, consider using the 'stop' statement instead [try
> http://www.rsyslog.com/e/2307 ]
> Nov 30 13:50:34 nfs-master rsyslogd: rsyslogd's groupid changed to 104
> Nov 30 13:50:34 nfs-master rsyslogd: rsyslogd's userid changed to 101
> Nov 30 13:50:34 nfs-master kernel: [    0.000000] Initializing cgroup
> subsys cpuset
> Nov 30 13:50:34 nfs-master kernel: [    0.000000] Initializing cgroup
> subsys cpu
> Nov 30 13:50:34 nfs-master kernel: [    0.000000] Initializing cgroup
> subsys cpuacct
> 
> ```
> 
> 
> 
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux