MDS failover, how to speed it up?

Brian Lagoni <brianl@xxxxxxxxxxx> · Mon, 20 Jun 2016 13:04:29 +0200

Are anyone here able to help us with a question about mds failover?

The case is that we are hitting a bug in ceph which requires us to restart the mds every week. 
There is a bug and PR for it here - https://github.com/ceph/ceph/pull/9456 but until this have been resolved we need to do a restart. Unless there are a better workaround for this bug?

The issue we are having are when we do a failover, the time it takes for the cephfs kernel client to recover are high enough so that the vm guests using this cephfs are having timeouts to they storage and therefor enters readonly mode.

We have tried with making a failover to another mds or restarting the mds while it's the only mds in the cluser and in both cases our cephfs kernel client are taking too long to recover. 
We have also tried to set the failover MDS into "MDS_STANDBY_REPLAY" mode which didn't help on this matter.

When doing a failover all IOPS against ceph are being blocked for 2-5 min until the kernel cephfs clients recovers after some timeouts messages like these:
"2016-06-19 19:09:55.573739 7faaf8f48700  0 log_channel(cluster) log [WRN] : slow request 75.141028 seconds old, received at 2016-06-19 19:08:40.432655: client_request(client.4283066:4164703242 getattr pAsLsXsFs #100000000fe 2016-06-19 19:08:40.429496) currently failed to rdlock, waiting"
After this there is a huge spike i IOPS data starts to being processed again.

I'm not sure if any of this can be related to this warning which are present 90% of the day.
"mds0: Behind on trimming (94/30)"?
I have searched the mailing list for clues and answers on what to do about this but haven't found anything which have helped us. 
We have move/isolated the MDS service to it's own VM with the fastest processor we having, without any real changes to this warning.

 Our infrastructure is the following:
 - We use CEPH/CEPHFS (10.2.1)
 - We have 3 mons and 6 storage servers with a total of 36 OSDs (~4160 PGs).
 - We have one main mds and one standby mds.
 - The primary MDS is a virtual machine with 8 core E5-2643 v3 @ 3.40GHz(steal time=0), 16G mem
 - We are using ceph kernel client to mount cephfs.
 - Ubuntu 16.04 (4.4.0-22-generic kernel)
 - The OSD's are physical machines with 8 cores & 32GB memory
 - All networking is 10Gb

So at the end are there anything we can do to make the failover and recovery to go faster?

Regards,
Brian Lagoni
System administrator, Engineering Tools
Unity Technologies
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com