Re: MDS failover, how to speed it up?

Brian Lagoni <brianl@xxxxxxxxxxx> · Tue, 21 Jun 2016 09:00:37 +0200

I will plan to add more logging and other info you have asked for at the next MDS restart. 
As this cluster are being used in production, I have a limited maintenance window, so unless I don't find a time outside this window you have to wait until Sunday/Monday to get the logs.

@John, yes I have used the "ceph mds fail <ID>" but I would like to do it again with a bit more logging, just to be sure.

@Zheng, It might be due to pressure in the MDS server, I don't see a critical high load on the MDS server ~ 0.4 and see ~90Mbit traffic from and to the MDS in in average.
Also a extra question, when doing a "df -i" on the cephfs mountpoint, I get a high inode count which looks like it's all the inodes on all the OSD's combined divided with the amount of replicas, is this assumption correct?

Please let me know if there are any more info needed.

Regards

On 20 June 2016 at 14:09, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
On Mon, Jun 20, 2016 at 7:04 PM, Brian Lagoni <brianl@xxxxxxxxxxx> wrote:

> Are anyone here able to help us with a question about mds failover?

>

> The case is that we are hitting a bug in ceph which requires us to restart

> the mds every week.

> There is a bug and PR for it here - https://github.com/ceph/ceph/pull/9456

> but until this have been resolved we need to do a restart. Unless there are

> a better workaround for this bug?

>

> The issue we are having are when we do a failover, the time it takes for the

> cephfs kernel client to recover are high enough so that the vm guests using

> this cephfs are having timeouts to they storage and therefor enters readonly

> mode.

>

> We have tried with making a failover to another mds or restarting the mds

> while it's the only mds in the cluser and in both cases our cephfs kernel

> client are taking too long to recover.

> We have also tried to set the failover MDS into "MDS_STANDBY_REPLAY" mode

> which didn't help on this matter.

>

> When doing a failover all IOPS against ceph are being blocked for 2-5 min

> until the kernel cephfs clients recovers after some timeouts messages like

> these:

> "2016-06-19 19:09:55.573739 7faaf8f48700  0 log_channel(cluster) log [WRN] :

> slow request 75.141028 seconds old, received at 2016-06-19 19:08:40.432655:

> client_request(client.4283066:4164703242 getattr pAsLsXsFs #100000000fe

> 2016-06-19 19:08:40.429496) currently failed to rdlock, waiting"

> After this there is a huge spike i IOPS data starts to being processed

> again.

>

> I'm not sure if any of this can be related to this warning which are present

> 90% of the day.

> "mds0: Behind on trimming (94/30)"?

> I have searched the mailing list for clues and answers on what to do about

> this but haven't found anything which have helped us.

> We have move/isolated the MDS service to it's own VM with the fastest

> processor we having, without any real changes to this warning.

>

>  Our infrastructure is the following:

>  - We use CEPH/CEPHFS (10.2.1)

>  - We have 3 mons and 6 storage servers with a total of 36 OSDs (~4160 PGs).

>  - We have one main mds and one standby mds.

>  - The primary MDS is a virtual machine with 8 core E5-2643 v3 @

> 3.40GHz(steal time=0), 16G mem

>  - We are using ceph kernel client to mount cephfs.

>  - Ubuntu 16.04 (4.4.0-22-generic kernel)

>  - The OSD's are physical machines with 8 cores & 32GB memory

>  - All networking is 10Gb

>

> So at the end are there anything we can do to make the failover and recovery

> to go faster?

I guess your MDS is very busy. there are lots of inodes in client

cache. Please run 'ceph daemon mds.xxx session ls' before restarting

the MDS, and send the output to us.

Regards

Yan, Zheng

>

> Regards,

> Brian Lagoni

> System administrator, Engineering Tools

> Unity Technologies

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com