[10.2.1] cephfs, mds reliability - client isn't responding to mclientcaps(revoke)

James Webb <jamesw@xxxxxxxxxxx> · Tue, 31 May 2016 17:15:16 -0500

Dear ceph-users...

My team runs an internal buildfarm using ceph as a backend storage platform. We’ve recently upgraded to Jewel and are having reliability issues that we need some help with.

Our infrastructure is the following:
- We use CEPH/CEPHFS (10.2.1)
- We have 3 mons and 6 storage servers with a total of 36 OSDs (~4160 PGs). 
- We use enterprise SSDs for everything including journals 
- We have one main mds and one standby mds.
- We are using ceph kernel client to mount cephfs.
- We have upgrade to Ubuntu 16.04 (4.4.0-22-generic kernel)
- We are using a kernel NFS to serve NFS clients from a ceph mount (~ 32 nfs threads. 0 swappiness)
- These are physical machines with 8 cores & 32GB memory

On a regular basis, we lose all IO via ceph FS. We’re still trying to isolate the issue but it surfaces as an issue between MDS and ceph client.  
We can’t tell if our our NFS server is overwhelming the MDS or if this is some unrelated issue. Tuning NFS server has not solved our issues.
So far our only recovery has been to fail the MDS and then restart our NFS. Any help or advice will be appreciated on the CEPH side of things.
I’m pretty sure we’re running with default tuning of CEPH MDS configuration parameters.

Here are the relevant log entries.

>From my primary MDS server, I start seeing these entries start to pile up:

2016-05-31 14:34:07.091117 7f9f2eb87700  0 log_channel(cluster) log [WRN] : client.4283066 isn't responding to mclientcaps(revoke), ino 10000004491 pending pAsLsXsFsxcrwb issued pAsxLsXsxFsxcrwb, sent 63.877480 seconds ago\
2016-05-31 14:34:07.091129 7f9f2eb87700  0 log_channel(cluster) log [WRN] : client.4283066 isn't responding to mclientcaps(revoke), ino 10000005ddf pending pAsLsXsFsxcrwb issued pAsxLsXsxFsxcrwb, sent 63.877382 seconds ago\
2016-05-31 14:34:07.091133 7f9f2eb87700  0 log_channel(cluster) log [WRN] : client.4283066 isn't responding to mclientcaps(revoke), ino 10000000a2a pending pAsLsXsFsxcrwb issued pAsxLsXsxFsxcrwb, sent 63.877356 seconds ago

>From my NFS server, I see these entries from dmesg also start piling up:
[Tue May 31 14:33:09 2016] libceph: skipping mds0 X.X.X.195:6800 seq 0 expected 4294967296
[Tue May 31 14:33:09 2016] libceph: skipping mds0 X.X.X.195:6800 seq 1 expected 4294967296
[Tue May 31 14:33:09 2016] libceph: skipping mds0 X.X.X.195:6800 seq 2 expected 4294967296

Next, we find something like this on one of the OSDs.:
2016-05-31 14:34:44.130279 mon.0 XX.XX.XX.188:6789/0 1272184 : cluster [INF] HEALTH_WARN; mds0: Client storage-nfs-01 failing to respond to capability release

Finally, I am seeing consistent HEALTH_WARN in my status regarding trimming which I am not sure if it is related:

cluster XXXXXXXX-bd8f-4091-bed3-8586fd0d6b46
     health HEALTH_WARN
            mds0: Behind on trimming (67/30)
     monmap e3: 3 mons at {storage02=X.X.X.190:6789/0,storage03=X.X.X.189:6789/0,storage04=X.X.X.188:6789/0}
            election epoch 206, quorum 0,1,2 storage04,storage03,storage02
      fsmap e74879: 1/1/1 up {0=cephfs-03=up:active}, 1 up:standby
     osdmap e65516: 36 osds: 36 up, 36 in
      pgmap v15435732: 4160 pgs, 3 pools, 37539 GB data, 9611 kobjects
            75117 GB used, 53591 GB / 125 TB avail
                4160 active+clean
  client io 334 MB/s rd, 319 MB/s wr, 5839 op/s rd, 4848 op/s wr

Regards,
James Webb
DevOps Engineer, Engineering Tools
Unity Technologies
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com