On Tue, Oct 6, 2015 at 2:21 PM, Dzianis Kahanovich <mahatma@xxxxxxxxxxxxxx> wrote: > John Spray пишет: >> >> On Tue, Oct 6, 2015 at 1:22 PM, Dzianis Kahanovich >> <mahatma@xxxxxxxxxxxxxx> wrote: >>> >>> Even now I remove "mds standby replay = true": >>> e7151: 1/1/1 up {0=b=up:active}, 2 up:standby >>> Cluster stuck on KILL active mds.b. How to correctly stop mds to get >>> behaviour like on MONs - leader->down/peon->leader? >> >> >> It's not clear to me why you're saying it's stuck. Is it stuck, or is it >> slow? > > > It totally sleep (stuck) up to HEALTH_OK (to rejoin complete). Not slow. > "mds cluster degraded". Okay, so if I understand you correctly, "it" means the client IO. The MDS cluster isn't stuck, but client metadata operations are blocked while the MDS cluster is degraded. That is expected behaviour. The idea is that MDS failover should be rare and quick enough that the interruption to client IO isn't a problem, so the interesting part is finding out why the failover isn't happening quickly enough. Next time you go through this process, turn up the MDS debug logs (if 10 is too verbose for your system, maybe just set to 7 or so), and also capture the relevant section of the cluster log (i.e. the ceph.log) so that we can see how ranks are being assigned during the failover event. That would give us enough information to know why this is taking longer than it should. >> What special actions are you having to perform? It looks like your >> cluster is coming back online eventually? > > > I don't test while, something like: > ceph mds stop <who> > ceph mds deactivate <who> > ceph mds tell <who> <args> [<args>...] > - before KILL > > - something to tell mds to release "active" status and move it to another. > Also I look to "mds shutdown check = <int>" (?). > Or fix mds to do it on KILL if nothing this. I see that you've listed some commands, but I'm not sure I understand what action you're actually taking here? If you're looking for the command that notifies ceph that an MDS daemon is gone for good and another daemon should take over, it's "ceph mds fail <rank>". John _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com