We see something very similar on our Ceph cluster, starting as of today. We use a 16 node, 102 OSD Ceph installation as the basis for an Icehouse OpenStack cluster (we applied the RBD patches for live migration etc) On this cluster we have a big ownCloud installation (Sync & Share) that stores its files on three NFS servers, each mounting 6 2TB RBD volumes and exposing them to around 10 web server VMs (we originally started with one NFS server with a 100TB volume, but that has become unwieldy). All of the servers (hypervisors, ceph storage nodes and VMs) are using Ubuntu 14.04 Yesterday evening we added 23 ODSs to the cluster bringing it up to 125 OSDs (because we had 4 OSDs that were nearing the 90% full mark). The rebalancing process ended this morning (after around 12 hours) The cluster has been clean since then: cluster b1f3f4c8-xxxxx health HEALTH_OK monmap e2: 3 mons at {zhdk0009=[yyyy:xxxx::1009]:6789/0,zhdk0013=[yyyy:xxxx::1013]:6789/0,zhdk0025=[yyyy:xxxx::1025]:6789/0}, election epoch 612, quorum 0,1,2 zhdk0009,zhdk0013,zhdk0025 osdmap e43476: 125 osds: 125 up, 125 in pgmap v18928606: 3336 pgs, 17 pools, 82447 GB data, 22585 kobjects 266 TB used, 187 TB / 454 TB avail 3319 active+clean 17 active+clean+scrubbing+deep client io 8186 kB/s rd, 7747 kB/s wr, 2288 op/s At midnight, we run a script that creates an RBD snapshot of all RBD volumes that are attached to the NFS servers (for backup purposes). Looking at our monitoring, around that time, one of the NFS servers became unresponsive and took down the complete ownCloud installation (load on the web server was > 200 and they had lost some of the NFS mounts) Rebooting the NFS server solved that problem, but the NFS kernel server kept crashing all day long after having run between 10 to 90 minutes. We initially suspected a corrupt rbd volume (as it seemed that we could trigger the kernel crash by just ?ls -l? one of the volumes, but subsequent ?xfs_repair -n? checks on those RBD volumes showed no problems. We migrated the NFS server off of its hypervisor, suspecting a problem with RBD kernel modules, rebooted the hypervisor but the problem persisted (both on the new hypervisor, and on the old one when we migrated it back) We changed the /etc/default/nfs-kernel-server to start up 256 servers (even though the defaults had been working fine for over a year) Only one of our 3 NFS servers crashes (see below for syslog information) - the other 2 have been fine May 23 21:44:10 drive-nfs1 kernel: [ 165.264648] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory May 23 21:44:19 drive-nfs1 kernel: [ 173.880092] NFSD: starting 90-second grace period (net ffffffff81cdab00) May 23 21:44:23 drive-nfs1 rpc.mountd[1724]: Version 1.2.8 starting May 23 21:44:28 drive-nfs1 kernel: [ 182.917775] ip_tables: (C) 2000-2006 Netfilter Core Team May 23 21:44:28 drive-nfs1 kernel: [ 182.958465] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) May 23 21:44:28 drive-nfs1 kernel: [ 183.044091] ip6_tables: (C) 2000-2006 Netfilter Core Team May 23 21:45:10 drive-nfs1 CRON[1867]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) May 23 21:45:17 drive-nfs1 collectd[1872]: python: Plugin loaded but not configured. May 23 21:45:17 drive-nfs1 collectd[1872]: Initialization complete, entering read-loop. May 23 21:47:11 drive-nfs1 kernel: [ 346.392283] init: plymouth-upstart-bridge main process ended, respawning May 23 21:51:26 drive-nfs1 kernel: [ 600.776177] INFO: task nfsd:1696 blocked for more than 120 seconds. May 23 21:51:26 drive-nfs1 kernel: [ 600.778090] Not tainted 3.13.0-53-generic #89-Ubuntu May 23 21:51:26 drive-nfs1 kernel: [ 600.779507] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 23 21:51:26 drive-nfs1 kernel: [ 600.781504] nfsd D ffff88013fd93180 0 1696 2 0x00000000 May 23 21:51:26 drive-nfs1 kernel: [ 600.781508] ffff8800b2391c50 0000000000000046 ffff8800b22f9800 ffff8800b2391fd8 May 23 21:51:26 drive-nfs1 kernel: [ 600.781511] 0000000000013180 0000000000013180 ffff8800b22f9800 ffff880035f48240 May 23 21:51:26 drive-nfs1 kernel: [ 600.781513] ffff880035f48244 ffff8800b22f9800 00000000ffffffff ffff880035f48248 May 23 21:51:26 drive-nfs1 kernel: [ 600.781515] Call Trace: May 23 21:51:26 drive-nfs1 kernel: [ 600.781523] [<ffffffff81727749>] schedule_preempt_disabled+0x29/0x70 May 23 21:51:26 drive-nfs1 kernel: [ 600.781526] [<ffffffff817295b5>] __mutex_lock_slowpath+0x135/0x1b0 May 23 21:51:26 drive-nfs1 kernel: [ 600.781528] [<ffffffff8172964f>] mutex_lock+0x1f/0x2f May 23 21:51:26 drive-nfs1 kernel: [ 600.781557] [<ffffffffa03b1761>] nfsd_lookup_dentry+0xa1/0x490 [nfsd] May 23 21:51:26 drive-nfs1 kernel: [ 600.781568] [<ffffffffa03b044b>] ? fh_verify+0x14b/0x5e0 [nfsd] May 23 21:51:26 drive-nfs1 kernel: [ 600.781591] [<ffffffffa03b1bb9>] nfsd_lookup+0x69/0x130 [nfsd] May 23 21:51:26 drive-nfs1 kernel: [ 600.781613] [<ffffffffa03be90a>] nfsd4_lookup+0x1a/0x20 [nfsd] May 23 21:51:26 drive-nfs1 kernel: [ 600.781628] [<ffffffffa03c055a>] nfsd4_proc_compound+0x56a/0x7d0 [nfsd] May 23 21:51:26 drive-nfs1 kernel: [ 600.781638] [<ffffffffa03acd3b>] nfsd_dispatch+0xbb/0x200 [nfsd] May 23 21:51:26 drive-nfs1 kernel: [ 600.781662] [<ffffffffa028762d>] svc_process_common+0x46d/0x6d0 [sunrpc] May 23 21:51:26 drive-nfs1 kernel: [ 600.781678] [<ffffffffa0287997>] svc_process+0x107/0x170 [sunrpc] May 23 21:51:26 drive-nfs1 kernel: [ 600.781687] [<ffffffffa03ac71f>] nfsd+0xbf/0x130 [nfsd] May 23 21:51:26 drive-nfs1 kernel: [ 600.781696] [<ffffffffa03ac660>] ? nfsd_destroy+0x80/0x80 [nfsd] May 23 21:51:26 drive-nfs1 kernel: [ 600.781702] [<ffffffff8108b6b2>] kthread+0xd2/0xf0 May 23 21:51:26 drive-nfs1 kernel: [ 600.781707] [<ffffffff8108b5e0>] ? kthread_create_on_node+0x1c0/0x1c0 May 23 21:51:26 drive-nfs1 kernel: [ 600.781712] [<ffffffff81733868>] ret_from_fork+0x58/0x90 May 23 21:51:26 drive-nfs1 kernel: [ 600.781717] [<ffffffff8108b5e0>] ? kthread_create_on_node+0x1c0/0x1c0 Before each crash, we see the disk utilization of one or two random mounted RBD volumes to go to 100% - there is no pattern on which of the RBD disks start to act up. We have scoured the log files of the Ceph cluster for any signs of problems but came up empty. The NFS server has almost no load (compared to regular usage) as most sync clients are either turned off (weekend) or have given up connecting to the server. There haven't been any configuration change on the NFS servers prior to the problems. The only change was the adding of 23 OSDs. We use ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) Our team is completely out of ideas. We have removed the 100TB volume from the nfs server (we used the downtime to migrate the last data off of it to one of the smaller volumes). The NFS server has been running for 30 minutes now (with close to no load) but we don?t really expect it to make it until tomorrow. send help Jens-Christian -- SWITCH Jens-Christian Fischer, Peta Solutions Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland phone +41 44 268 15 15, direct +41 44 268 15 71 jens-christian.fischer at switch.ch http://www.switch.ch http://www.switch.ch/stories On 23.05.2015, at 20:38, John-Paul Robinson (Campus) <jpr at uab.edu> wrote: > We've had a an NFS gateway serving up RBD images successfully for over a year. Ubuntu 12.04 and ceph .73 iirc. > > In the past couple of weeks we have developed a problem where the nfs clients hang while accessing exported rbd containers. > > We see errors on the server about nfsd hanging for 120sec etc. > > The nfs server is still able to successfully interact with the images it is serving. We can export non rbd shares from the local file system and nfs clients can use them just fine. > > There seems to be something weird going on with rbd and nfs kernel modules. > > Our ceph pool is in a warn state due to an osd rebalance that is continuing slowly. But the fact that we continue to have good rbd image access directly on the server makes me think this is not related. Also the nfs server is only a client of the pool, it doesnt participate in it. > > Has anyone experienced similar issues? > > We do have a lot of images attached to the server but he issue is there even when we map only a few. > > Thanks for any pointers. > > ~jpr > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150523/aca8d4b3/attachment.pgp>