On Mon, Jun 20, 2016 at 7:04 PM, Brian Lagoni <brianl@xxxxxxxxxxx> wrote: > Are anyone here able to help us with a question about mds failover? > > The case is that we are hitting a bug in ceph which requires us to restart > the mds every week. > There is a bug and PR for it here - https://github.com/ceph/ceph/pull/9456 > but until this have been resolved we need to do a restart. Unless there are > a better workaround for this bug? > > The issue we are having are when we do a failover, the time it takes for the > cephfs kernel client to recover are high enough so that the vm guests using > this cephfs are having timeouts to they storage and therefor enters readonly > mode. > > We have tried with making a failover to another mds or restarting the mds > while it's the only mds in the cluser and in both cases our cephfs kernel > client are taking too long to recover. > We have also tried to set the failover MDS into "MDS_STANDBY_REPLAY" mode > which didn't help on this matter. > > When doing a failover all IOPS against ceph are being blocked for 2-5 min > until the kernel cephfs clients recovers after some timeouts messages like > these: > "2016-06-19 19:09:55.573739 7faaf8f48700 0 log_channel(cluster) log [WRN] : > slow request 75.141028 seconds old, received at 2016-06-19 19:08:40.432655: > client_request(client.4283066:4164703242 getattr pAsLsXsFs #100000000fe > 2016-06-19 19:08:40.429496) currently failed to rdlock, waiting" > After this there is a huge spike i IOPS data starts to being processed > again. > > I'm not sure if any of this can be related to this warning which are present > 90% of the day. > "mds0: Behind on trimming (94/30)"? > I have searched the mailing list for clues and answers on what to do about > this but haven't found anything which have helped us. > We have move/isolated the MDS service to it's own VM with the fastest > processor we having, without any real changes to this warning. > > Our infrastructure is the following: > - We use CEPH/CEPHFS (10.2.1) > - We have 3 mons and 6 storage servers with a total of 36 OSDs (~4160 PGs). > - We have one main mds and one standby mds. > - The primary MDS is a virtual machine with 8 core E5-2643 v3 @ > 3.40GHz(steal time=0), 16G mem > - We are using ceph kernel client to mount cephfs. > - Ubuntu 16.04 (4.4.0-22-generic kernel) > - The OSD's are physical machines with 8 cores & 32GB memory > - All networking is 10Gb > > So at the end are there anything we can do to make the failover and recovery > to go faster? I guess your MDS is very busy. there are lots of inodes in client cache. Please run 'ceph daemon mds.xxx session ls' before restarting the MDS, and send the output to us. Regards Yan, Zheng > > Regards, > Brian Lagoni > System administrator, Engineering Tools > Unity Technologies > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com