Re: avoid 3-mds fs laggy on 1 rejoin?

Dzianis Kahanovich <mahatma@xxxxxxxxxxxxxx> · Tue, 06 Oct 2015 17:53:04 +0300

John Spray пишет:
On Tue, Oct 6, 2015 at 2:21 PM, Dzianis Kahanovich
<mahatma@xxxxxxxxxxxxxx> wrote:
John Spray пишет:

On Tue, Oct 6, 2015 at 1:22 PM, Dzianis Kahanovich
<mahatma@xxxxxxxxxxxxxx> wrote:

Even now I remove "mds standby replay = true":
e7151: 1/1/1 up {0=b=up:active}, 2 up:standby
Cluster stuck on KILL active mds.b. How to correctly stop mds to get
behaviour like on MONs - leader->down/peon->leader?

It's not clear to me why you're saying it's stuck.  Is it stuck, or is it
slow?

It totally sleep (stuck) up to HEALTH_OK (to rejoin complete). Not slow.
"mds cluster degraded".

Okay, so if I understand you correctly, "it" means the client IO.  The
MDS cluster isn't stuck, but client metadata operations are blocked
while the MDS cluster is degraded.  That is expected behaviour.

Yes, IT=IO ;). Simple: "ls /mnt/ceph" wait until "HEALTH_WARN: mds cluster 
degraded" end ("ceph health detail" say node is "rejoin", same - in log).

The idea is that MDS failover should be rare and quick enough that the
interruption to client IO isn't a problem, so the interesting part is
finding out why the failover isn't happening quickly enough.

Next time you go through this process, turn up the MDS debug logs (if
10 is too verbose for your system, maybe just set to 7 or so), and
also capture the relevant section of the cluster log (i.e. the
ceph.log) so that we can see how ranks are being assigned during the
failover event.  That would give us enough information to know why
this is taking longer than it should.

What special actions are you having to perform?  It looks like your
cluster is coming back online eventually?

I don't test while, something like:
ceph mds stop <who>
ceph mds deactivate <who>
ceph mds tell <who> <args> [<args>...]
- before KILL

- something to tell mds to release "active" status and move it to another.
Also I look to "mds shutdown check = <int>" (?).
Or fix mds to do it on KILL if nothing this.

I see that you've listed some commands, but I'm not sure I understand
what action you're actually taking here?  If you're looking for the
command that notifies ceph that an MDS daemon is gone for good and
another daemon should take over, it's "ceph mds fail <rank>".

I want to say to mds to go to delegate "active" state in normal behaviour to 
anybody else by normal shutdown (deactivation) process. If possible. So, 
sometimes similar "ceph osd down" helps before kill to avoid some misbehaviour 
(on busy node/mon state osd can be "out" immediately without timeout). As I 
understand, "standby-reply" must prepare other mds to fast activation, but in 
theory - same can be happened (+wait) if mds replication started only on shutdown.

John
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com